Module 3

Data Science/Machine Learning

3.1

3.1.1 Data Carpentry (From Schiller)

The Data Carpentry teaches fundamental data skills needed to conduct research. It provides general and domain-specific training for data driven research. The focus is on introductory computational skills for data management and analysis that enable learners to quickly apply the content to their own research. The target audience are learners who have little to no prior computational experience. Curriculum materials are available online at Link.

3.1.2 Mathematical foundation of data science and machine learning (From Qi Wang)

This is an undergraduate course math 528 offered at UofSC. The course content includes: basic information theory, unconstrained and constrained optimization, gradient descent methods for numerical optimization, supervised and unsupervised learning, various reduced order methods, sampling and inference, Monte Carlo methods, deep neural networks.  

The course will be taught by Qi Wang and the course materials will be available at www.math.sc.edu/~qwang.

3.1.3 Machine learning; (From Getman)

  • https://www.coursera.org/learn/machine-learning This course by Andrew Ng is pretty good course to start on Machine learning. You will have basic ideas of machine learning and different algorithms.   
  • Link this youtuber also give a solid course on machine learning with python. Also if you want to have a practice on machine learning projects.
  • You can go to kaggle which is website for machine learning competition and bunch of datasets , you could try to build up your own machine learning code. A good start project would be  https://www.kaggle.com/c/titanic or any other project tagged as ‘Knowledge’.”

3.1.4 Argonne Training Program on Extreme-Scale Computing (From Schiller)

The Argonne Training Program on Extreme-Scale Computing (ATPESC) provides training on the key skills, approaches, and tools to design, implement, and execute computational science and engineering applications on high performance computing systems. The program is normally offered as an intensive, two-week training workshop. Video recordings from past workshops are available on Argonne National Laboratory Youtube channel. Of interest for materials simulation and data science are the following tracks:

Track 4 – Visualization and Data Analysis (5 recorded lectures, total viewing time 5-6 hours)

Track 5 – Numerical Algorithms and Software (10 recorder lecture, total viewing time 9-10 hours)

Track 8 – Machine Learning and Deep Learning for Science (11 recorded lectures, total viewing time 5-6 hours)

The content is suitable for graduate students who have completed introductory courses in scientific computing and/or parallel computing.

3.2

3.2.1 Tensor Flow (From Qi Wang)

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.
Link

3.2.2 Pytorch (From Qi Wang)

Pytorch is an open source machine learning framework that accelerates the path from research prototyping to production deployment. It has the similar functionalities as Tensor Flow.
Link

3.2.3 Q-chem (From Vitaly Rassolov)

Q-Chem is a general-purpose electronic structure package[1][2][3] featuring a variety of established and new methods implemented using innovative algorithms that enable fast calculations of large systems on various computer architectures, from laptops and regular lab workstations to midsize clusters and HPCC, using density functional and wave-function based approaches. It offers an integrated graphical interface and input generator; a large selection of functionals and correlation methods, including methods for electronically excited states and open-shell systems; solvation models; and wave-function analysis tools. In addition to serving the computational chemistry[4] community, Q-Chem also provides a versatile code development platform.
Link

3.2.4 Carolina Materials Database (From Jianjun Hu)

We have developed a hypothetical materials database http://www.carolinamatdb.org/. It contains 36,847 inorganic material compounds with over 70,000 calculated properties. These compounds are generated by our deep learning based generative models for cubic crystal structures.

3.2.5 Physics Informed Neural Networks (PINN) on github (From Qi Wang)

We introduce physics informed neural networks – neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. We present our developments in the context of solving two main classes of problems: data-driven solution and data-driven discovery of partial differential equations. Depending on the nature and arrangement of the available data, we devise two distinct classes of algorithms, namely continuous time and discrete time models. The resulting neural networks form a new class of data-efficient universal function approximators that naturally encode any underlying physical laws as prior information. In the first part, we demonstrate how these networks can be used to infer solutions to partial differential equations, and obtain physics-informed surrogate models that are fully differentiable with respect to all input coordinates and free parameters. In the second part, we focus on the problem of data-driven discovery of partial differential equations.
Link