# What is Data Science?

Data Science is the process and method for extracting knowledge and insights from large volumes of disparate data. It’s an interdisciplinary field involving mathematics, statistical analysis, data visualization, machine learning, and more.

# Clustering in Data Science

Clustering is an unsupervised Machine Learning model algorithm used to group similar data points together and discover underlying patterns. Among other clustering models, k-means clustering is the most popular and easy to use.

## 1. Introduction

Indonesia is a country in Southeast Asia and Oceania, between the Indian and Pacific oceans. It consists of more than seventeen thousand islands, including Sumatra, Java, Sulawesi, and parts of Borneo (Kalimantan) and New Guinea (Papua). Indonesia is the world’s largest island country and the 14th-largest country by land area, at 1,904,569 square kilometres (735,358 square miles).

The business idea is to build Indonesian Restaurant that serves all kinds of Indonesian dishes. The idea is to introduce Indonesian cultures to the world using Indonesian cuisine as the media. In order for this business to be successful, or to be in-line with the idea, we have to find a location that allows our restaurant to gain recognition and exposure from foreign visitors that are visiting the country.

# Data Preparation

Before we go further into Data Preparation section, we have to know which city in Indonesia has the most foreign visitors in total. To do that, we have to find information regarding foreign visitors to Indonesia by point of entry.

`# Import Librariesfrom geopy.geocoders import Nominatimfrom geopy.extra.rate_limiter import RateLimiter# Using Nominatim geocoder APIgeolocator = Nominatim(user_agent='Chris_P_Bacon_') # Crispy Bacon!!geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)bali_df['Location'] = bali_df['Sub-District'].apply(geocode)`

# Exploratory Data Analysis

In this exploratory data analysis section, I want to explore the dataset to see the frequency of occurences of each venue category and sort them from 1st to 3rd most common venue in each sub-district. ‘Venue Category’ column in our dataset contains information that are useful for our Machine Learning. But unfortunately, the column’s datatype is string which can’t be used in the clustering algorithm.

`# Groupby 'Sub-District'bali_grouped = bali_onehot.groupby('Sub-District').mean().reset_index()`

# Machine Learning — Clustering Model (K-Means Clustering)

The `KMeans` algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares (see here). This algorithm requires the number of clusters to be specified. It scales well to large number of samples and has been used across a large range of application areas in many different fields.

`# Initiate object for clustering datasetbali_clustering = bali_grouped.drop('Sub-District', 1)# Build clustering model to find the appropriate number of KK = range(1, 6)inertias = [] # Empty list for inertiafor k in K:    # Building and fitting the model    KMeansModel = KMeans(n_clusters=k, random_state=4)    KMeansModel.fit(bali_clustering)    inertias.append(KMeansModel.inertia_)`
• Distinguishable by the numbers of conventional stores that occurred as the most common venue at almost every sub-district, it is also a common thing in suburb and rural areas to have lots of convenience stores here in Indonesia, not just in Bali.
• Based on my observation on the `Folium` map result, cluster 1 is mostly consist of restaurants and some other types of venue categories. But there are also many suburb areas labeled as cluster 1 (e.g. Look at those 5 sub-districts near ‘Gilimanuk’ on the top left side of the map.) , that is because restaurants exist even in the suburb areas.
• “Beaches also can be labeled as recreation site!”. True, but mostly, seashore areas in Bali are owned by hotels, villas, and restaurants. That is also the reason why seashores labeled as cluster 1 all over the map.

# Conclusion

Based on the results, it is safe to say that I can build a restaurant in urban areas in Bali. Bali is a small island and it is a main tourist destination in Indonesia. There are no “uncrowded” areas in Bali, every city/regency is densely populated.

Data Analyst/Scientist | Github: jonando93

Data Analyst/Scientist | Github: jonando93