DATA MANIPULATION TECHNIQUES
Data manipulation refers to the process of adjusting data to make it easier to read or be more organized. Manipulation of data is crucial for growing organizations and businesses. The raw data needs to be adjusted in order to be properly used for trend analysis, cutting costs, analyzing customer behaviour among others. Data manipulation contributes to the overall efficiency, you get to know what information is relevant or not. Let’s look at data manipulation with NumPy and pandas. The data manipulation capabilities of pandas are built on top of the NumPy library. In a way, NumPy is a dependency of the pandas library. NumPy stands for ‘Numerical Python’. It is a python library that provides fast mathematical computations and processing of single dimensional and multi dimensional array and matrices. In order to use NumPy, there is need to check whether it is installed or not. After installing, then we can import it in our IDE, in my case Jupyter Notebook IDE.
You can create arrays using NumPy. There are several ways to go about it and some of them include;
Creating a simple 1-D array.
Creating an array with a set sequence.
Creating a matrix of 3×4 dimension with all ones, we will be using the below code:
Accessing the array index can be done in multiple ways. In order to print a range of an array, slicing is done. It refers to defining a range in a new array which can be used to print a range of elements from the original array. For example;
In NumPy, arrays allow a wide range of operations which can be performed on a particular array or a combination of arrays. These include some basic mathematical operations.
Pandas. It is an open-source python library providing high performance, easy to use data structures and data analysis tools. Pandas library uses most of the functionalities of NumPy. To use pandas, start by importing the pandas module in the IDE.
A Data frame is a two-dimensional data structure. We can create an empty data frame using pandas;
You can also convert a NumPy array into a dataframe;
Let’s look at reading a dataset into a dataframe. For example using the titanic dataset.
We can select columns as shown below;
Apply function in Pandas is used for manipulating a pandas dataframe and creating new variables. function returns some value after passing each row/column of a data frame with some function. The function can be both default or user-defined. For instance, here it can be used to find the #missing values in each row and column.