::p_load(igraph, tidygraph, ggraph,
pacman
visNetwork, lubridate, clock,
tidyverse, graphlayouts, concaveman, ggforce, jsonlite, dplyr)
Sailor Shift: Rise and Resonance
Getting Start
Installing and loading the required libraries
Importing data
<- fromJSON("data/MC1_graph.json",
t_data simplifyDataFrame = TRUE)
1. Introduction
Sailor Shift is one of the most influential figures in the development of “Oceans Folk” music. From her humble beginnings as a singer on Oceanus Island to her current status as a global superstar, she has grown to represent not only her own personal success, but has also propelled this niche genre into the world. This project uses data analysis and visualization to delve deeper into her network of collaborations, musical influences, and her importance in the overall music ecosystem. We will reveal how she has influenced others and been shaped by the zeitgeist, and further reflect on what her rise reveals about the new generation of musicians.
2. Data processing
2.1. Extracting Edges and Nodes
<- as_tibble(t_data$nodes)
nodes_tbl <- as_tibble(t_data$links) edges_tbl
2.2. Get closer to data
2.2.1. Edges
glimpse(edges_tbl)
Rows: 37,857
Columns: 4
$ `Edge Type` <chr> "InterpolatesFrom", "RecordedBy", "PerformerOf", "Composer…
$ source <int> 0, 0, 1, 1, 2, 2, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
$ target <int> 1841, 4, 0, 16180, 0, 16180, 0, 5088, 14332, 11677, 2479, …
$ key <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
length(unique(edges_tbl$`Edge Type`))
[1] 12
unique(edges_tbl$`Edge Type`)
[1] "InterpolatesFrom" "RecordedBy" "PerformerOf"
[4] "ComposerOf" "ProducerOf" "InStyleOf"
[7] "LyricalReferenceTo" "CoverOf" "DistributedBy"
[10] "MemberOf" "LyricistOf" "DirectlySamples"
The edges dataset contains 37,857 records and 4 fields to represent the various relationships between entities in the network. Each edge contains the node IDs (source and target) of the starting and ending points, as well as 12 Edge Types describing the nature of the relationship, such as “PerformerOf”, ‘ComposerOf’ or “RecordedBy”. Meanwhile, the key field is used to distinguish between multiple connections between the same node pair.
2.2.2. Nodes
glimpse(nodes_tbl)
Rows: 17,412
Columns: 10
$ `Node Type` <chr> "Song", "Person", "Person", "Person", "RecordLabel", "S…
$ name <chr> "Breaking These Chains", "Carlos Duffy", "Min Qin", "Xi…
$ single <lgl> TRUE, NA, NA, NA, NA, FALSE, NA, NA, NA, NA, TRUE, NA, …
$ release_date <chr> "2017", NA, NA, NA, NA, "2026", NA, NA, NA, NA, "2020",…
$ genre <chr> "Oceanus Folk", NA, NA, NA, NA, "Lo-Fi Electronica", NA…
$ notable <lgl> TRUE, NA, NA, NA, NA, TRUE, NA, NA, NA, NA, TRUE, NA, N…
$ id <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1…
$ written_date <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "2020", NA, NA,…
$ stage_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ notoriety_date <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
The nodes dataset contains 17,412 entries, each representing an entity within the music network and categorized under the Node Type column as “Person”, “Song”, or “RecordLabel”. Each node includes relevant attributes based on its type—for example, songs have fields such as single, release_date, genre, and notable, while people may have stage_name and notoriety_date. The presence of missing values (NA) in many fields indicates that certain attributes are only applicable to specific node types.
2.2.3. Initial EDA
ggplot(data = edges_tbl,
aes(y = `Edge Type`)) +
geom_bar()
This bar chart above shows the distribution of different edge types in the music relationship network. The most common type is PerformerOf, indicating that the data heavily captures who performed which work. Other frequent types include ComposerOf, LyricistOf, and ProducerOf, highlighting the importance of creative and production roles. In contrast, relationships like MemberOf and DirectlySamples are less common, suggesting these connections are either rarer or less documented.
ggplot(data = nodes_tbl,
aes(y = `Node Type`)) +
geom_bar()
This bar chart displays the distribution of different node types in the music network dataset. The most common type is Person, with a count far exceeding other categories, indicating a strong focus on individual artists, producers, and contributors. Songs also appear in large numbers, highlighting the dataset’s emphasis on works being created or performed. Other types like Albums, RecordLabels, and MusicalGroups are present but in significantly smaller quantities.
3. Creating Knowledge Graph
3.1. Mapping from node id to row index
<- tibble(id = nodes_tbl$id,
id_map index = seq_len(
nrow(nodes_tbl)))
3.2. Map source and target IDs to row indices
<- edges_tbl %>%
edges_tbl left_join(id_map, by = c("source" = "id")) %>%
rename(from = index) %>%
left_join(id_map, by = c("target" = "id")) %>%
rename(to = index)
3.3. Filter out any unmatched (invalid) edges
<- edges_tbl %>%
edges_tbl filter(!is.na(from), !is.na(to))
3.4. Creating tidygraph
<- tbl_graph(nodes = nodes_tbl,
graph edges = edges_tbl,
directed = t_data$directed)
class(graph)
[1] "tbl_graph" "igraph"
4. Visualising the knowledge graph
set.seed(1234)
4.1. Visualising the whole graph
ggraph(graph, layout = "fr") +
geom_edge_link(alpha = 0.3,
colour = "gray") +
geom_node_point(aes(color = `Node Type`),
size = 4) +
geom_node_text(aes(label = name),
repel = TRUE,
size = 2.5) +
theme_void()
4.2. Visualising the sub-graph
4.2.1. Filtering edges to only “MemberOf”
<- graph %>%
graph_memberof activate(edges) %>%
filter(`Edge Type` == "MemberOf")
4.2.2. Extracting only connected nodes (i.e., used in these edges)
<- graph_memberof %>%
used_node_indices activate(edges) %>%
as_tibble() %>%
select(from, to) %>%
unlist() %>%
unique()
4.2.3. Keeping only those nodes
<- graph_memberof %>%
graph_memberof activate(nodes) %>%
mutate(row_id = row_number()) %>%
filter(row_id %in% used_node_indices) %>%
select(-row_id) # optional cleanup
4.2.4. Plotting the sub-graph
ggraph(graph_memberof,
layout = "fr") +
geom_edge_link(alpha = 0.5,
colour = "gray") +
geom_node_point(aes(color = `Node Type`),
size = 1) +
geom_node_text(aes(label = name),
repel = TRUE,
size = 2.5) +
theme_void()
5. Sailor Shift’s Career Connections
5.1. The contributors who shaped the modern Sailor Shift
A singer’s journey to fame is never a solitary one. Sailor has been accompanied by many — fellow singers, producers, instrumentalists, composers, and others who helped shape her path.
# Sailor Shift's Index
<- which(nodes_tbl$name == "Sailor Shift") sailor_idx
# Sailor Shift's works'Index
<- graph %>%
perf_edges activate(edges) %>%
as_tibble() %>%
filter(`Edge Type` == "PerformerOf", from == sailor_idx)
<- perf_edges %>% pull(to) %>% unique()
sailor_works_idx
<- unique(c(sailor_idx, sailor_works_idx)) focus_idx1
# Keep Edges that 'influence' Sailor Shift's works
<- c("ComposerOf", "ProducerOf", "LyricistOf", "CoverOf")
influence_types1
<- graph %>%
graph_influence1 activate(edges) %>%
filter(
`Edge Type` %in% influence_types1,
%in% focus_idx1
to )
# Extract Nodes
<- graph_influence1 %>%
used_node_indices1 activate(edges) %>%
as_tibble() %>%
select(from, to) %>%
unlist() %>%
unique()
# Keep Nodes
<- graph_influence1 %>%
graph_influence1 activate(nodes) %>%
mutate(.row = row_number()) %>%
filter(.row %in% used_node_indices1) %>%
select(-.row)
# Plot
ggraph(graph_influence1, layout = "fr") +
geom_edge_link(aes(color = `Edge Type`),
arrow = arrow(length = unit(4, "pt"), type = "closed"),
end_cap = circle(3, "pt"),
start_cap = circle(3, "pt"),
width = 0.5,
alpha = 0.6,
show.legend = TRUE) +
geom_node_point(aes(color = `Node Type`),
size = 2) +
geom_node_text(aes(label = name),
size = 2.5,
repel = TRUE,
max.overlaps = Inf) +
scale_edge_colour_brewer(palette = "Set2",
name = "Edge Type") +
scale_color_manual(values = c(
"Person" = "#377EB8",
"Album" = "#E41A1C",
"RecordLabel" = "#4DAF4A"
name = "Node Type") +
), theme_void() +
theme(
legend.position = "right",
legend.title = element_text(size = 10),
legend.text = element_text(size = 8),
plot.margin = margin(5, 5, 5, 5)
)
This network diagram places Sailor Shift at its center and reveals the diverse teams behind each album. By mapping the ComposerOf, ProducerOf and LyricistOf relationships, it clearly shows which composers, producers, and record labels have shaped her work. From the visualization, it’s clear that Ewan MacRae has had the greatest influence on her discography: he not only composed Oceanbound alone but also teamed up with Freya Lindholm and Astrid Nørgaard to co-create Coral Beats, leaving a significant mark on two albums—far more than any other contributor.
5.2. Who did Sailor Shift influenced
Throughout Sailor’s career, not only has Sailor received influences from others, but her work has begun to inspire others, extending her creative reach beyond her immediate circle.
# Sailor's works
<- perf_edges %>%
layer1_targets pull(to)
# Works influenced by Silor's works
<- c("DirectlySamples", "InStyleOf",
influence_types2 "LyricalReferenceTo", "InterpolatesFrom")
<- graph %>%
layer2_targets activate(edges) %>%
as_tibble() %>%
filter(`Edge Type` %in% influence_types2,
%in% layer1_targets) %>%
from pull(to)
# Creators of those influenced works
<- c("ComposerOf", "ProducerOf", "LyricistOf")
creator_types
<- graph %>%
graph_sub2 activate(edges) %>%
filter(
`Edge Type` == "PerformerOf" & from == sailor_idx) |
(`Edge Type` %in% influence_types2 & from %in% layer1_targets) |
(`Edge Type` %in% creator_types & to %in% layer2_targets)
(
)
<- graph_sub2 %>%
used_nodes2 activate(edges) %>%
as_tibble() %>%
select(from, to) %>%
unlist() %>%
unique()
<- graph_sub2 %>%
graph_sub2 activate(nodes) %>%
mutate(.row = row_number()) %>%
filter(.row %in% used_nodes2) %>%
select(-.row)
ggraph(graph_sub2, layout = "fr") +
geom_edge_link(aes(color = `Edge Type`),
arrow = arrow(length = unit(3, "pt"), type = "closed"),
end_cap = circle(2.5, "pt"),
start_cap = circle(2.5, "pt"),
width = 0.6,
alpha = 0.7) +
geom_node_point(aes(color = `Node Type`), size = 3) +
geom_node_text(aes(label = name), repel = TRUE, size = 2.5, max.overlaps = Inf) +
scale_edge_colour_manual(values = c(
PerformerOf = "#8DD3C7",
DirectlySamples = "#FB8072",
InStyleOf = "#80B1D3",
LyricalReferenceTo = "#FDB462",
InterpolatesFrom = "#B3DE69",
ComposerOf = "#FCCDE5",
ProducerOf = "#BEBADA",
LyricistOf = "#FFED6F"
name = "Relation") +
), scale_color_manual(values = c(
Person = "#377EB8",
Album = "#E41A1C",
Song = "#4DAF4A",
RecordLabel = "#984EA3",
MusicalGroup = "#FF7F00"
name = "Node Type") +
), theme_void() +
theme(
legend.position = "right",
legend.title = element_text(size = 10),
legend.text = element_text(size = 8)
)
This visualization adopts a four-layer peeling approach: at the very center sits Sailor Shift (blue), surrounded by her own recordings and lyric-penned tracks (red and green). The third ring maps the songs that directly sample, stylistically echo, lyrically reference, or interpolate her work (green), and the outermost layer identifies the composers, producers, and lyricists (blue) behind those derivative pieces. By counting connection frequencies, Wei Zhao stands out as the most heavily influenced creator—appearing under two separate derivative tracks—making them the single individual most shaped by Sailor Shift’s musical legacy.
5.3. Sailor Shift‘s influence to the Oceanus Folk community
# Sailor's Index
<- which(nodes_tbl$name == "Sailor Shift") sailor_idx
# Sailor's works
<- c("PerformerOf")
creative_edge_types
<- graph %>%
perf_edges activate(edges) %>%
as_tibble() %>%
filter(`Edge Type` %in% creative_edge_types, from == sailor_idx)
<- perf_edges %>% pull(to) %>% unique() sailor_works_idx
nodes_tbl[sailor_works_idx, ]
# A tibble: 26 × 10
`Node Type` name single release_date genre notable id written_date
<chr> <chr> <lgl> <chr> <chr> <lgl> <int> <chr>
1 Album Tidal Pop W… NA 2028 Ocea… TRUE 17272 2027
2 Album Salty Dreams NA 2030 Ocea… TRUE 17273 2029
3 Album The Current… NA 2032 Ocea… TRUE 17274 2031
4 Album Coral Beats NA 2034 Ocea… TRUE 17275 2033
5 Album Tides & Bal… NA 2036 Ocea… TRUE 17276 2035
6 Album Oceanbound NA 2038 Ocea… TRUE 17277 2037
7 Album Echoes of t… NA 2040 Ocea… TRUE 17278 2039
8 Song High Tide H… TRUE 2028 Ocea… FALSE 17279 <NA>
9 Song Electric Ee… FALSE 2028 Ocea… TRUE 17280 <NA>
10 Song Sun-Drenche… FALSE 2028 Ocea… FALSE 17281 <NA>
# ℹ 16 more rows
# ℹ 2 more variables: stage_name <chr>, notoriety_date <chr>
# Oceanus Folk Community works
<- nodes_tbl %>%
oceanus_works_idx mutate(idx = row_number()) %>%
filter(genre == "Oceanus Folk") %>%
pull(idx)
# Combine all nodes
<- unique(c(sailor_works_idx, oceanus_works_idx)) focus_idx
nodes_tbl[focus_idx, ]
# A tibble: 305 × 10
`Node Type` name single release_date genre notable id written_date
<chr> <chr> <lgl> <chr> <chr> <lgl> <int> <chr>
1 Album Tidal Pop W… NA 2028 Ocea… TRUE 17272 2027
2 Album Salty Dreams NA 2030 Ocea… TRUE 17273 2029
3 Album The Current… NA 2032 Ocea… TRUE 17274 2031
4 Album Coral Beats NA 2034 Ocea… TRUE 17275 2033
5 Album Tides & Bal… NA 2036 Ocea… TRUE 17276 2035
6 Album Oceanbound NA 2038 Ocea… TRUE 17277 2037
7 Album Echoes of t… NA 2040 Ocea… TRUE 17278 2039
8 Song High Tide H… TRUE 2028 Ocea… FALSE 17279 <NA>
9 Song Electric Ee… FALSE 2028 Ocea… TRUE 17280 <NA>
10 Song Sun-Drenche… FALSE 2028 Ocea… FALSE 17281 <NA>
# ℹ 295 more rows
# ℹ 2 more variables: stage_name <chr>, notoriety_date <chr>
# Influence Types
<- c(
influence_types3 "DirectlySamples",
"InStyleOf",
"LyricalReferenceTo",
"InterpolatesFrom",
"CoverOf"
)
# Filter Edges
<- graph %>%
graph_3 activate(edges) %>%
filter(`Edge Type` %in% influence_types3 )
# Extracting Nodes
<- graph_3 %>%
used_node_indices3 activate(edges) %>%
as_tibble() %>%
select(from, to) %>%
unlist() %>%
unique()
# Keep Nodes
<- graph_3 %>%
graph_3 activate(nodes) %>%
mutate(row_id = row_number()) %>%
filter(row_id %in% focus_idx) %>%
select(-row_id) # optional cleanup
# Add label
<- graph_3 %>%
graph_3 activate(nodes) %>%
mutate(is_sailor_work = ifelse(name %in% nodes_tbl$name[sailor_works_idx],
"Sailor's Work", "Other"))
# Ploting
ggraph(graph_3, layout = "fr") +
geom_edge_link(alpha = 0.5, colour = "gray") +
geom_node_point(aes(color = is_sailor_work), size = 1.5) +
theme_void()
Sailor Shift has influenced collaborators in the Oceanus Folk community primarily through indirect inspiration. Her works, though few in number, are embedded across different parts of the network, suggesting they have been referenced or sampled by multiple creators. While she doesn’t appear to collaborate repeatedly with specific individuals, her influence spans across stylistic clusters, indicating a broad and decentralized artistic impact