Our Skills | Datastatistica

Soft Skills

Understanding the problems of a domain and then brainstorming on all possible solutions,
Imagining and experimenting new ideas, which came from the data of a domain,
Illustrating the benefits that data can bring to a business or an organization,

Technical Skills

R and Python programming,
Using software Excel, JMP, Tableau, Power BI, SPSS and STATISTICA

Statistics, probability, and machine learning skills:

Data preprocessing:

Data preprocessing techniques consist of data handling, data cleaning, data transformation, data reduction, data scaling, data balancing, data reliability and validity.

Data visualization:

Data visualization tools to create illustrative dashboards and highlight features of data such as trends, patterns, cycles and outliers.

Probability concept and probability distributions:

Basic concepts of probability play an important role in machine learning.

Sampling techniques:

Applying accurate sampling method makes strong statistical inferences about the population.

Statistical Inference:

Statistical Inference skills help the researchers to conclude about a population based on a sample through (point and interval) estimation and hypothesis testing. Statistical inference is a method of making decisions about the parameters of a population, based on random sampling.

Machine Learning Technique
- Supervised learning, which involves Determining the type of dataset, Splittingdataset into training, test, and validation datasets, Determining the input features of the training dataset, which should have enough knowledge so that the model can accurately predict the output, Determining the suitable algorithm for the model, Executing the algorithm on the training dataset, Evaluating the accuracy of the model by providing the test set. Supervised learning can be further divided into two types of problems;

- - Classification with common algorithms logistic regression, K-nearest neighbor, support vector machines (SVM), neural networks and decision trees/ random forest,
  - Regression consists of simple/multiple linear regression, non-linear regression, generalized linear (mixed) model, mediation and moderation and conditional process models, multivariate linear regression, Ridge and Lasso Regression. Read more
- Unsupervised learning, which helps to find the hidden patterns and insights from the given data. Here, we have the input data but no corresponding output data. The goal of unsupervised learning is to find the underlying structure of dataset, groups the data according to similarities, and represent the dataset in a compressed format. The unsupervised learning algorithm can be further categorized into three types of problems:
  - Clustering with common algorithms k-means clustering, hierarchal clustering, and density-based clustering algorithms.
  - Principal Component Analysis (PCA)
  - Association rules

Time Series Analysis
- Exponential Smoothing
- Autoregressive Integrated Moving Average (ARIMA/SARIMA)
- Linear Regression with Time Series Components
- Autoregressive Distributed Lag (ARDL) Model
- Time series Machine learning (for noisy data, long sequences, high dimensional data)
- - Time series clustering
- - Time series classification
- - Time series forecasting (Feedforward neural network (MLP), Recurrent neural network (RNN), Long short-term memory (LSTM))
Multi-Criteria Decision-Making Methods; The Analytic Hierarchy Process (AHP/Fuzzy AHP), Technique for Order of Preference by Similarities to Ideal Solution (TOPSIS/ Fuzzy TOPSIS), ISM and DMATEL.

Experimental Design with the following key steps
- Determining independent and dependent variables and how they are related
- Writing a specific, testable hypothesis
- Designing experimental treatments to manipulate the independent variable
- Assigning subjects to groups, either between-subjects or within-subjects
- Planning how to measure the dependent variable, while controlling any extraneous variables that might influence the results.
- Statistical Analysis
  - Experimental design І; Single Factor Designs
  - Experimental design ІІ; Factorial Designs
  - Correlational Research (correlation and regression)
  - Small N Design (single subject method)

Shiny Programming to build interactive web apps and dashboards straight from R.
Data envelopment analysis (DEA) to measure the relative efficiency of a group of similar entities or decision-making units (DMUs) with multiple inputs and multiple outputs by the index CCR, BCC, and MI (the Malmquist index, which evaluates the efficiency change over time and is applied for panel data)