What is Data Science ?
What is Data Science ?
What is Data Science in Tamil
- Data science is a study of the huge amount of data, which used for extracting from raw, structured, and unstructured data that is processed using the scientific method, different technologies, and algorithms.
What is Data Science
- In short, we will say that data science is all about:
- Asking the right questions and analyzing the data .
- Modeling the data using various complex and efficient algorithms.
- Visualizing the data to get a far better perspective.
- Understanding the data to create better decisions and finding the ultimate result.
Data Science Steps
Need of Data Science
- With the help of data science technology, convert the huge amount of raw and unstructured data into meaningful vision.
- Data science is working for automating transportation such as creating a self-driving car etc,
Need of Data Science
- Some years ago, data was less and mostly available during a structured form, which might be easily stored in excel sheets , and processed using BI tools.
- But in today's world, data is becoming so huge, i.e., approximately 2.5 quintals bytes of knowledge is generating on a day , which led to data explosion. It's estimated as per researches, that by 2020, 1.7 MB of knowledge are going to be created at every single second, by one person on earth. Every Company requires data to figure , grow, and improve their businesses.
- Now, handling of such huge amount of knowledge may be a challenging task for each organization. So to handle, process, and analysis of this, we required some complex, powerful, and efficient algorithms and technology, which technology came into existence as data Science. Following are some main reasons for using data science technology:
- With the help of data science technology, we will convert the huge amount of raw and unstructured data into meaningful insights.
- Data science technology is opting by various companies, whether it's an enormous brand or a startup. Google, Amazon, Netflix, etc, which handle the large amount of knowledge , are using data science algorithms for better customer experience.
- Data science is functioning for automating transportation like creating a self-driving car, which is the future of transportation.
- Data science can help in several predictions like various survey, elections, flight ticket confirmation, etc.
Need for Data Science
Read Also
Structure Data and Unstructured Data:
- Structure data is the normal student’s database.
- Unstructured data which is nothing but your “Facebook and your Google data”.
- Structure and an unstructured data need to manipulate for implementing your “machine learning and artificial engines”.
Struct and Unstruct
Data Science Job
- As per various surveys, data scientist job is becoming the foremost demanding Job of the 21st century thanks to increasing demands for data science. Some people also called it "the hottest job title of the 21st century". Data scientists are the experts who can use various statistical tools and machine learning algorithms to know and analyze the info .
- The average salary range for data scientist are going to be approximately $95,000 to $ 165,000 per year , and as per different researches, about 11.5 many job are going to be created by the year 2026.
Types of Data Science Job
- If you learn data science, then you get the chance to find out various exciting job roles during this domain. The most job roles are given below:
- Data Scientist
- Data Analyst
- Machine learning expert
- Data engineer
- Data Architect
- Data Administrator
- Business Analyst
- Business Intelligence Manager
Types of Datascience Job
1. Data Analyst
- Data analyst is a private , who performs mining of large amount of data , models the data , looks for patterns, relationship, trends, and so on. At last, he comes up with visualization and reporting for analyzing the data for deciding and problem-solving process.
- Skill required
2. Machine Learning Expert
- The machine learning expert is that the one who works with various machine learning algorithms utilized in data science like regression, clustering, classification, decision tree, random forest, etc.
- Skill Required
3. Data Engineer
- A data engineer works with massive amount of data and manage for building and maintaining the data architecture of a datascience project. Data engineer also works for the creation of data set processes utilized in modeling, mining, acquisition, and verification.
- Skill required
4. Data Scientist
- A data scientist may be a professional who works with a huge amount of knowledge to come up with compelling business insights through the deployment of varied tools, techniques, methodologies, algorithms, etc.
- Skill required
- To become a knowledge scientist, You should have technical language skills like R, SAS, SQL, Python, Hive, Pig, Apache spark, MATLAB. Data scientists must have an understanding of Statistics, Mathematics, visualization, and communication skills.
Requirement for Data Science
Non-Technical Requirement
Non Technical Requirement
- Curiosity : To find out data science, one must have curiosities. once you have curiosity and ask various questions, then you'll understand the business problem easily.
- Critical Thinking : It's also required for a data scientist in order that you'll find multiple new ways to solve the situation efficiently .
- Communication skills : Communication skills are most important for a data scientist because after solving a business problem, you would like to speak it with the team.
Technical Requirement
Technical
- Machine learning : To know data science, You must understand the concept of machine learning. Data science uses machine learning algorithms to solve various problems.
- Mathematical modeling: Mathematical modeling is required to form fast mathematical calculations and predictions from the available data.
- Statistics : Basic understanding of statistics is required, like mean, median, or variance . It's needed to extract knowledge and acquire better results from the data.
- Computer programming : For data science, knowledge of a minimum of one programing language is required. R, Python, Spark are some required programming languages for data science.
- Databases : The depth understanding of Databases like SQL, is important for data science to get the data and to work with data.
Read Also
Business Intelligence Vs Data Science:
Business Intelligence Vs Data Science
Business intelligence | Data Science |
---|---|
Business intelligence deals with structured data, e.g., data warehouse. | Data science deals with structured and unstructured data, e.g., weblogs, feedback, etc. |
Analytical(historical data) | Scientific(goes deeper to know the reason for the data report) |
Statistics and Visualization are the two skills required for business intelligence. | Statistics, Visualization, and Machine learning are the required skills for data science. |
Business intelligence focuses on both Past and present data | Data science focuses on past data, present data, and also future predictions. |
Data Science Components
- Statistics
- Domain Expertise
- Data engineering
- Visualization
- Advanced computing
- Mathematics
- Machine Learning
Data Science Components
1. Statistics
- Statistics is the important components of data science. Statistics is used to collect and analyze the numerical data in a large amount and finding meaningful insights from it.
2. Domain Expertise
- In data science, domain expertise binds data science together. Domain expertise means specialized knowledge or skills of a specific area. In data science, there are various areas that we'd like domain experts.
3. Data engineering
- Data engineering may be a a part of data science, which involves acquiring, storing, retrieving, and transforming the data . Data engineering also includes metadata (data about data) to the data .
4. Visualization
- Data visualization is meant by representing data during a visual context in order that people can easily understand the importance of data. Data visualization makes it easy to access the large amount of data in visuals.
5. Advanced computing
- Advanced computing involves designing, writing, debugging, and maintaining the ASCII text file of computer programs.
Read Also
6. Mathematics
- Mathematics is that the critical a part of data science. Mathematics involves the study of quantity, structure, space, and changes. For a data scientist, knowledge of excellent mathematics is important .
7. Machine learning
- Machine learning is backbone of data science. To provide training to a machine in order that it can act as a person's brain. In data science, we use various machine learning algorithms to solve the issues .
Tools for Data Science
Tools for Data Science
Data Analysis tools:
- R, Python, Statistics, SAS, Jupyter, R Studio, MATLAB, Excel, RapidMiner.
Data Warehousing:
- ETL, SQL, Hadoop, Informatica/Talend, AWS Redshift
Data Visualization tools:
- R, Jupyter, Tableau, Cognos.
Machine learning tools:
- Spark, Mahout, Azure ML studio.
Machine learning in Data Science
- To become a data scientist, one should even be aware of machine learning and its algorithms, as in data science, there are various machine learning algorithms which are broadly getting used . Following are the name of some machine learning algorithms used in data science:
- Regression
- Decision tree
- Clustering
- Principal component analysis
- Support vector machines
- Naive Bayes
- Artificial neural network
- Apriori
Machine Learning in Data Science
We will provide you some brief introduction for few of the important algorithms here,
Linear Regression Algorithm
- Linear regression is that the hottest machine learning algorithm based on supervised learning. This algorithm work on regression, which may be a method of modeling target values based on independent variables. It represents the shape of the equation , which features a relationship between the set of inputs and predictive output. This algorithm is generally used in forecasting and predictions. Since it shows the linear relationship between input and output variable, hence it's called linear regression .
Chart
- The below equation can describe the connection between x and y variables:
Y= mx+c
- Where, y= Variable
- X= Experimental variable
- M= Slope
- C= Intercept.
Decision Tree
- Decision Tree algorithm is another machine learning algorithm, which belongs to the supervised learning algorithm. It is used for both classification and regression problems.
- In the decision tree algorithm, we will solve the problem , by using tree representation during which , each node represents a feature, each branch represents a decision , and every leaf represents the result .
Following is that the example for employment offer problem:
Decision Tree
- In the decision tree, we start from the root of the tree and compare the values of the root attribute with record attribute. On the idea of this comparison, we follow the branch as per the worth then move to next node. We continue comparing these values until we reach the leaf node with predicated class value.
K-Means Clustering
- K-means clustering is also one of the most popular algorithms of machine learning, which belongs to the unsupervised learning algorithm. It solves the clustering problem.
- If we are given a data set of items , with certain features and values, and that we got to categorize those set of items into groups, so such type of problems are often solved using k-means clustering algorithm.
- K-means clustering algorithm aims at minimizing an objective function, which referred to as squared error function, and it's given as:
- Where,
- J(V) => Objective function
- '||xi - vj||' => Euclidean distance between xi and vj.
- ci' => Number of knowledge points in ith cluster.
- C => Number of clusters.
How to solve a drag in Data Science using Machine learning algorithms ?
- Now, let's understand what are the common sorts of problems occurred in data science and what's the approach to solving the issues . So in data science, problems are solved using algorithms, and below is that the diagram representation for applicable algorithms for possible questions:
Possible Questions
Is this A or B ?
- We can ask this kind of problem which has only two fixed solutions like Yes or No, 1 or 0, may or might not . And this kind of problems are often solved using classification algorithms.
Is this different ?
- We can ask kind of question which belongs to various patterns, and that we got to find odd from them. Such kind of problems are often solved using Anomaly Detection Algorithms.
How much or how many ?
- The other kind of problem occurs which invite numerical values or figures like what's the time today, what will be the temperature today, are often solved using regression algorithms.
Read Also
Data Science Lifecycle
- Discovery
- Data Preparation
- Model Planning
- Model building
- Operationalize
- Communicate Results
Data Science Life Cycle
Discovery
- Discovering all the requirements of the project such as the no. of people, technology, time, data, an end goal, and then frame the business problem.
Data preparation:
- Data preparation is additionally referred to as Data Munging. during this phase, we'd like to perform the subsequent tasks:
- Data cleaning
- Data Reduction
- Data integration
- Data transformation
Model Planning
- Planning the various methods and techniques to establish the relation between input variables.
- We'll apply Exploratory data analytics(EDA) by using various statistical formula and visualization tools to know the relations between variable and to see what data can inform us.
- Common tools used for model planning are:
- SQL Analysis Services
- R
- SAS
- Python
Model-building
- Create datasets for training and testing purpose. Apply different techniques such as association, classification, and clustering, to build the model.
- Some Common Model building tools:
- SAS Enterprise Miner
- WEKA
- SPCS Modeler
- MATLAB
Operationalize:
- Deliver the final reports of the project, along with briefings, code, and technical documents.
- This phase provides you a clear overview of complete project performance and other components on a small scale before the full deployment.
Communicate results:
- Communicate the findings and final result with the business team.
Applications of Data Science:
- Image recognition and speech recognition
- Gaming world
- Internet search
- Transport
- Healthcare
- Recommendation systems
- Risk detection
Applications of Data Science
Image recognition and speech recognition
- Data science is currently using for Image and speech recognition. Once you upload a picture on Facebook and begin getting the suggestion to tag to your friends. This automatic tagging suggestion uses image recognition algorithm, which is a component of data science.
- When you say something using, "Ok Google, Siri, Cortana", etc., and these devices respond as per voice control, so this is often possible with speech recognition algorithm.
Read Also
Gaming world
Internet Search
Transport
- Transport industries also using data science technology to make self-driving cars. With self-driving cars, it'll be easy to reduce the road accidents.
Healthcare
- In the healthcare sector, data science is providing many benefits. Data science is getting used for tumor detection, drug discovery, medical image analysis, virtual medical bots, etc.
Recommendation systems
- Most of the businesses, like Amazon, Netflix, Google Play, etc., are using data science technology for creating a better user experience with personalized recommendations. Such as, once you look for something on Amazon, and you started getting suggestions for similar products, so this is often due to data science technology.
Risk detection
- Finance industries always had a problem of fraud and risk of losses, but with the help of data science, this will be rescued.
- Most of the finance companies are trying to find the data scientist to avoid risk and any kind of losses with a rise in customer satisfaction.