Case Study: Deep Rock Galactic Missions Data

Case Study

Analyzing Deep Rock Galactic Daily Missions Data Over July 2024

Author

invictus

Published

July 31, 2024

Modified

August 5, 2024

Introduction

Rocks and stones may break my bones, but beer will never hurt me. gee, I’m feeling poetic today!

Anyone who plays Deep Rock Galactic (DRG) must’ve heard of that phrase once. It’s my personal favorite.

If you’re not familiar with the game, DRG is an FPS Sandbox Generated type of game, where you’ll play as a miner that lives on space. You’ll be diving 1000km to the underground to accomplish a given mission. These missions are randomly generated (or is it?) every 30 minutes.

Interestingly, these missions are the same for all players around the world, as long as you play on the same season.

For every mission, there’s a small chance that it’ll come with a mutator that changes that nature of the gameplay. One that most sought after is the ‘Double XP’ mutator, as you can guess, will double the amount of XP you’ll obtain.

XP is very important during the early games, because your level will determine what weapons you can unlock and what upgrades you can purchase.

Therefore, one of the reasons that inspired me to conduct this study is to discover the pattern of ‘Double XP’ mutator. I want to know whether it’s completely random or whether I can find out at which hours it usually appear.

I’m a beginner myself in the game, only have about 40 hours gameplay. So, the result of this study will benefit me very much.

Acknowledgement

This work could’ve not been done without the massive effort by rolfosian, who has written a script to extract the missions data stored on the game. Additionally, he has a site that displays the current active missions updated in real-time, https://doublexp.net/. Feel free to check the source code his github repo.

Objective

I haven’t run it myself on my PC. Fortunately, I discovered that he stores many of the collected missions in json format on the site. Not long after, I wrote simple python scripts to download, parse, and convert them to csv.

It was a lot of fun. Now, I want to rewrite the script in R and then finally analyze the data. The objective is to get a full daily missions data over July 2024. And then hopefully to find pattern on ‘Double XP’. Additionally, I’ll peek on other insights as well, because, why not?

Collecting Data

Let’s fire up our swiss-army knife, tidyverse.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

From the rolfosian’s project source code, I discovered that the json files are stored on https://doublexp.net/static/json/bulkmissions/{yyyy-mm-dd}.json. So let’s set that up as the base_url

json_base_url <- 'https://doublexp.net/static/json/bulkmissions/'

The base URL will be used to generate all the download links for each json file. Since the file names are formatted by ISO dates, it’s easy to generate using lubridate

library(lubridate)
start_date <- ymd('2024-07-01')
end_date <- ymd('2024-07-31')
date_list <- seq(start_date, end_date, by = 'day')

As simple as that. Now we have a list of date from July 1 to July 31

date_list

 [1] "2024-07-01" "2024-07-02" "2024-07-03" "2024-07-04" "2024-07-05"
 [6] "2024-07-06" "2024-07-07" "2024-07-08" "2024-07-09" "2024-07-10"
[11] "2024-07-11" "2024-07-12" "2024-07-13" "2024-07-14" "2024-07-15"
[16] "2024-07-16" "2024-07-17" "2024-07-18" "2024-07-19" "2024-07-20"
[21] "2024-07-21" "2024-07-22" "2024-07-23" "2024-07-24" "2024-07-25"
[26] "2024-07-26" "2024-07-27" "2024-07-28" "2024-07-29" "2024-07-30"
[31] "2024-07-31"

We can simply use paste0 to put .json on them for the extension.

filename_list <- paste0(date_list, '.json')
filename_list

 [1] "2024-07-01.json" "2024-07-02.json" "2024-07-03.json" "2024-07-04.json"
 [5] "2024-07-05.json" "2024-07-06.json" "2024-07-07.json" "2024-07-08.json"
 [9] "2024-07-09.json" "2024-07-10.json" "2024-07-11.json" "2024-07-12.json"
[13] "2024-07-13.json" "2024-07-14.json" "2024-07-15.json" "2024-07-16.json"
[17] "2024-07-17.json" "2024-07-18.json" "2024-07-19.json" "2024-07-20.json"
[21] "2024-07-21.json" "2024-07-22.json" "2024-07-23.json" "2024-07-24.json"
[25] "2024-07-25.json" "2024-07-26.json" "2024-07-27.json" "2024-07-28.json"
[29] "2024-07-29.json" "2024-07-30.json" "2024-07-31.json"

We can also use paste0() to combine them with the base URL to get the download links

json_urls <- paste0(json_base_url, filename_list)

Now we can use this list to download all the json files from doublexp.net. Let’s put a time recorder on it too, because why not 😆.

dir.create('json')

Warning in dir.create("json"): 'json' already exists

library(tictoc)

tic('Download all missions json for July 2024')
mapply(download.file, json_urls, paste0('json/', filename_list))
toc()

trying URL 'https://doublexp.net/static/json/bulkmissions/2024-07-01.json'
Content type 'application/json' length 261344 bytes (255 KB)
==================================================
downloaded 255 KB
...
https://doublexp.net/static/json/bulkmissions/2024-07-01.json https://doublexp.net/static/json/bulkmissions/2024-07-01.json 
...
Download all missions json for July 2024: 64.759 sec elapsed

64 seconds. Not bad. Now let’s parse it into a data frame. But first, we need to understand the structure of the JSON.

Processing Data

Parsing one JSON

Due to its unstructured nature, JSON can be messy and hard to parse. Fortunately, the JSON we have here is not too complex. The good thing is, all the JSON are structured the same way. So, we only need to figure out a way to parse one JSON, to parse all of them.

Let’s take a look at the JSON file.

library(jsonlite)


Attaching package: 'jsonlite'

The following object is masked from 'package:purrr':

    flatten

json_data <- read_json('./json/2024-07-01.json')

json_data |> length()

[1] 50

json_data[c(1:3)] |> glimpse()

List of 3
 $ 2024-07-01T00:00:00Z:List of 2
  ..$ timestamp: chr "2024-07-01T00:00:00Z"
  ..$ Biomes   :List of 6
  .. ..$ Glacial Strata            :List of 3
  .. ..$ Crystalline Caverns       :List of 4
  .. ..$ Sandblasted Corridors     :List of 5
  .. ..$ Radioactive Exclusion Zone:List of 4
  .. ..$ Dense Biozone             :List of 5
  .. ..$ Hollow Bough              :List of 3
 $ 2024-07-01T00:30:00Z:List of 2
  ..$ timestamp: chr "2024-07-01T00:30:00Z"
  ..$ Biomes   :List of 5
  .. ..$ Crystalline Caverns       :List of 4
  .. ..$ Azure Weald               :List of 4
  .. ..$ Fungus Bogs               :List of 7
  .. ..$ Radioactive Exclusion Zone:List of 4
  .. ..$ Dense Biozone             :List of 4
 $ 2024-07-01T01:00:00Z:List of 2
  ..$ timestamp: chr "2024-07-01T01:00:00Z"
  ..$ Biomes   :List of 5
  .. ..$ Glacial Strata            :List of 5
  .. ..$ Crystalline Caverns       :List of 3
  .. ..$ Salt Pits                 :List of 9
  .. ..$ Azure Weald               :List of 4
  .. ..$ Radioactive Exclusion Zone:List of 4

So, the JSON has 4 levels:

Timestamps
Biomes
The biomes themselves
Mission for each biome (note that the fields under Biomes are also lists)

I’ve explored the JSON with View(), so I got a pretty rough idea of the general structure. I found there are 2 fields with different structure at the end of the lists.

json_data[c(49:50)] |> glimpse()

List of 2
 $ dailyDeal:List of 6
  ..$ ResourceAmount: int 138
  ..$ ChangePercent : num 57
  ..$ DealType      : chr "Buy"
  ..$ Credits       : int 8908
  ..$ Resource      : chr "Enor Pearl"
  ..$ timestamp     : chr "2024-07-01T00:00:00Z"
 $ ver      : int 4

The ver field is probably just an internal variable for doublexp.net. So it’s safe to remove it. dailyDeal however, is a useful data we can use to analyze the daily deal of the game. But we need to process it differently so it doesn’t interfere with the missions parsing.

Let’s store in a separate variable, dailyDeal

dailyDeal <- json_data$dailyDeal

Now, we can remove them from the json_data. I’m not comfortable mutating our JSON directly, so let’s assign it to a new variable.

drg_missions <- json_data
drg_missions$dailyDeal <- NULL
drg_missions$ver <- NULL

We’re good to go. It’s time to unravel this JSON to a beautiful data frame.

drg_missions <- tibble(drg_missions)

Unfortunately, this comes the hard part, unnesting the JSON. tidyr provides a powerful set of tools to unnest a data. Yet, I barely able to wrap my head around it.

Basically, there’s two main functions we’ll use:

unnest_wider(): To unpack a list into columns
unnest_longer(): To unpack a list into rows

First, we’ll do unnest_wider() to make the timestamps as a new column.

drg_missions |>
  unnest_wider(drg_missions)

# A tibble: 48 × 2
   timestamp            Biomes          
   <chr>                <list>          
 1 2024-07-01T00:00:00Z <named list [6]>
 2 2024-07-01T00:30:00Z <named list [5]>
 3 2024-07-01T01:00:00Z <named list [5]>
 4 2024-07-01T01:30:00Z <named list [5]>
 5 2024-07-01T02:00:00Z <named list [6]>
 6 2024-07-01T02:30:00Z <named list [5]>
 7 2024-07-01T03:00:00Z <named list [5]>
 8 2024-07-01T03:30:00Z <named list [5]>
 9 2024-07-01T04:00:00Z <named list [6]>
10 2024-07-01T04:30:00Z <named list [5]>
# ℹ 38 more rows

Then, we’ll unnest_longer() twice to unwrap the Biomes, and the biomes themselves (level 2 and 3).

drg_missions |> 
  unnest_wider(drg_missions) |> 
  unnest_longer(Biomes) |> 
  unnest_longer(Biomes)

# A tibble: 1,212 × 3
   timestamp            Biomes           Biomes_id            
   <chr>                <list>           <chr>                
 1 2024-07-01T00:00:00Z <named list [7]> Glacial Strata       
 2 2024-07-01T00:00:00Z <named list [8]> Glacial Strata       
 3 2024-07-01T00:00:00Z <named list [8]> Glacial Strata       
 4 2024-07-01T00:00:00Z <named list [7]> Crystalline Caverns  
 5 2024-07-01T00:00:00Z <named list [7]> Crystalline Caverns  
 6 2024-07-01T00:00:00Z <named list [7]> Crystalline Caverns  
 7 2024-07-01T00:00:00Z <named list [9]> Crystalline Caverns  
 8 2024-07-01T00:00:00Z <named list [7]> Sandblasted Corridors
 9 2024-07-01T00:00:00Z <named list [7]> Sandblasted Corridors
10 2024-07-01T00:00:00Z <named list [7]> Sandblasted Corridors
# ℹ 1,202 more rows

Finally, we’ll do unnests_wider() to unpack the mission details from each biome (level 4).

drg_missions_unnested <- drg_missions |> 
  unnest_wider(drg_missions) |> 
  unnest_longer(Biomes) |> 
  unnest_longer(Biomes) |> 
  unnest_wider(Biomes)

drg_missions_unnested

# A tibble: 1,212 × 11
   timestamp      PrimaryObjective SecondaryObjective Complexity Length CodeName
   <chr>          <chr>            <chr>              <chr>      <chr>  <chr>   
 1 2024-07-01T00… Point Extraction Glyphid Eggs       3          2      Forbidd…
 2 2024-07-01T00… Mining Expediti… ApocaBlooms        1          2      Duplici…
 3 2024-07-01T00… Deep Scan        ApocaBlooms        2          1      Mythic …
 4 2024-07-01T00… Mining Expediti… Glyphid Eggs       1          1      Blue Da…
 5 2024-07-01T00… Egg Hunt         ApocaBlooms        1          1      Crumbly…
 6 2024-07-01T00… On-Site Refining Dystrum            2          2      Frozen …
 7 2024-07-01T00… Deep Scan        Glyphid Eggs       3          2      Ancient…
 8 2024-07-01T00… Elimination      Fester Fleas       2          2      Clean D…
 9 2024-07-01T00… Escort Duty      ApocaBlooms        2          2      Fragile…
10 2024-07-01T00… Deep Scan        Hollomite          2          1      Awful A…
# ℹ 1,202 more rows
# ℹ 5 more variables: included_in <list>, id <int>, MissionWarnings <list>,
#   MissionMutator <chr>, Biomes_id <chr>

At this point, the data frame is almost done. It looks exactly like how we want it to be. Except, 2 columns are still lists: MissionWarnings and Included_in. We could do unnest_longer() on them, but it’ll make duplicate rows, since the only thing different is them. So, the alternative is to use paste to join them as one string, separated by comma.

drg_missions_unnested |> 
  select(included_in) |> 
  slice(c(1:4))

# A tibble: 4 × 1
  included_in
  <list>     
1 <list [3]> 
2 <list [3]> 
3 <list [1]> 
4 <list [3]>

drg_missions_unnested <- drg_missions_unnested |> 
  mutate(included_in = map_chr(included_in, ~ paste(.x, collapse = ", ")))

drg_missions_unnested

# A tibble: 1,212 × 11
   timestamp      PrimaryObjective SecondaryObjective Complexity Length CodeName
   <chr>          <chr>            <chr>              <chr>      <chr>  <chr>   
 1 2024-07-01T00… Point Extraction Glyphid Eggs       3          2      Forbidd…
 2 2024-07-01T00… Mining Expediti… ApocaBlooms        1          2      Duplici…
 3 2024-07-01T00… Deep Scan        ApocaBlooms        2          1      Mythic …
 4 2024-07-01T00… Mining Expediti… Glyphid Eggs       1          1      Blue Da…
 5 2024-07-01T00… Egg Hunt         ApocaBlooms        1          1      Crumbly…
 6 2024-07-01T00… On-Site Refining Dystrum            2          2      Frozen …
 7 2024-07-01T00… Deep Scan        Glyphid Eggs       3          2      Ancient…
 8 2024-07-01T00… Elimination      Fester Fleas       2          2      Clean D…
 9 2024-07-01T00… Escort Duty      ApocaBlooms        2          2      Fragile…
10 2024-07-01T00… Deep Scan        Hollomite          2          1      Awful A…
# ℹ 1,202 more rows
# ℹ 5 more variables: included_in <chr>, id <int>, MissionWarnings <list>,
#   MissionMutator <chr>, Biomes_id <chr>

Perfect! Later on, we’ll use the included_in to filter the missions available to us based on the season.

The lasts thing to convert is the MissionWarnings. We could combine them with paste() as well, but it’s a valuable data we can use to analyse the mission. It’s better to treat them as variables and separate it into two columns with unnest_wider().

drg_missions_unnested |> 
  unnest_wider(MissionWarnings, names_sep = '_')

# A tibble: 1,212 × 12
   timestamp      PrimaryObjective SecondaryObjective Complexity Length CodeName
   <chr>          <chr>            <chr>              <chr>      <chr>  <chr>   
 1 2024-07-01T00… Point Extraction Glyphid Eggs       3          2      Forbidd…
 2 2024-07-01T00… Mining Expediti… ApocaBlooms        1          2      Duplici…
 3 2024-07-01T00… Deep Scan        ApocaBlooms        2          1      Mythic …
 4 2024-07-01T00… Mining Expediti… Glyphid Eggs       1          1      Blue Da…
 5 2024-07-01T00… Egg Hunt         ApocaBlooms        1          1      Crumbly…
 6 2024-07-01T00… On-Site Refining Dystrum            2          2      Frozen …
 7 2024-07-01T00… Deep Scan        Glyphid Eggs       3          2      Ancient…
 8 2024-07-01T00… Elimination      Fester Fleas       2          2      Clean D…
 9 2024-07-01T00… Escort Duty      ApocaBlooms        2          2      Fragile…
10 2024-07-01T00… Deep Scan        Hollomite          2          1      Awful A…
# ℹ 1,202 more rows
# ℹ 6 more variables: included_in <chr>, id <int>, MissionWarnings_1 <chr>,
#   MissionWarnings_2 <chr>, MissionMutator <chr>, Biomes_id <chr>

Perfect! Now, we know how to parse 1 JSON. Let’s apply it to all the JSON we have and combine them into one giant tibble.

Parsing multiple JSON

To automate the algorithm above, we’d need to condense it to a function in which we can map it to all the JSON files.

parse_json_drg_mission <- function(json_path) {
  json_data <- read_json(json_path)
  json_data$dailyDeal <- NULL
  json_data$ver <- NULL
  
  drg_missions <- tibble(drg_missions = json_data)
  drg_missions <-  drg_missions |> 
    unnest_wider(drg_missions) |> 
    unnest_longer(Biomes) |> 
    unnest_longer(Biomes) |> 
    unnest_wider(Biomes)
  
  drg_missions <-  drg_missions |> 
    unnest_wider(MissionWarnings, names_sep = '_') |> 
    mutate(included_in = map_chr(included_in, ~ paste(.x, collapse = ", ")))
  
  return(drg_missions)
}

parse_json_drg_dailydeal <- function(json_path) {
  json_data <- read_json(json_path)
  dailyDeal <- json_data$dailyDeal
  dailyDeal <- tibble(list(dailyDeal))
  dailyDeal <- dailyDeal |> 
    unnest_wider(everything())
  return(dailyDeal)
}

You probably noticed why I reassign dailyDeal multiple times instead of just chaining the pipe operator. Please don’t ask why 😅. I spent more than 30 minutes debugging why the pipe operator messes up the code. In a nutshell, reassigning it preserves the ‘dailyDeal’ name inside the column, while pipe operator do the opposite.

Previously, it was supposed to be one function, returning a list of 2 tibbles. But that caused a lot of unexpected headache related to one I just mentioned. So I decided to just make separate functions to parse dailyDeal and missions.

Anyway, here’s the output of the functions

parse_json_drg_mission('./json/2024-07-01.json')

# A tibble: 1,212 × 12
   timestamp      PrimaryObjective SecondaryObjective Complexity Length CodeName
   <chr>          <chr>            <chr>              <chr>      <chr>  <chr>   
 1 2024-07-01T00… Point Extraction Glyphid Eggs       3          2      Forbidd…
 2 2024-07-01T00… Mining Expediti… ApocaBlooms        1          2      Duplici…
 3 2024-07-01T00… Deep Scan        ApocaBlooms        2          1      Mythic …
 4 2024-07-01T00… Mining Expediti… Glyphid Eggs       1          1      Blue Da…
 5 2024-07-01T00… Egg Hunt         ApocaBlooms        1          1      Crumbly…
 6 2024-07-01T00… On-Site Refining Dystrum            2          2      Frozen …
 7 2024-07-01T00… Deep Scan        Glyphid Eggs       3          2      Ancient…
 8 2024-07-01T00… Elimination      Fester Fleas       2          2      Clean D…
 9 2024-07-01T00… Escort Duty      ApocaBlooms        2          2      Fragile…
10 2024-07-01T00… Deep Scan        Hollomite          2          1      Awful A…
# ℹ 1,202 more rows
# ℹ 6 more variables: included_in <chr>, id <int>, MissionWarnings_1 <chr>,
#   MissionWarnings_2 <chr>, MissionMutator <chr>, Biomes_id <chr>

parse_json_drg_dailydeal('./json/2024-07-01.json')

# A tibble: 1 × 6
  ResourceAmount ChangePercent DealType Credits Resource   timestamp           
           <int>         <dbl> <chr>      <int> <chr>      <chr>               
1            138          57.0 Buy         8908 Enor Pearl 2024-07-01T00:00:00Z

Working perfectly, finally. Now, we just need to apply these functions to the list of JSON file paths and combine each of them with bindrows().

filepath_list <- paste0('json/',filename_list)

tic('combining dailyDeal')
dailyDeal_df_list <- map(filepath_list, parse_json_drg_dailydeal)
toc()

tic('combining missions')
missions_df_list <- map(filepath_list, parse_json_drg_mission)
toc()

combining dailyDeal: 1.266 sec elapsed
combining missions: 7.144 sec elapsed

Wow, that’s quite a significant processing time. Let’s cache them to csv files so I don’t need to convert them again from scratch if I need to debug.

But first, we need to combine them into one data frames

dailyDeal_df_combined <- dailyDeal_df_list |>
  bind_rows()
missions_df_combined <- missions_df_list |> 
  bind_rows()

dailyDeal_df_combined |> write_csv('csv/drg_dailydeal_july_2024.csv')
missions_df_combined |> write_csv('csv/drg_missions_july_2024.csv')

Now we can read from these csv files instead.

dailyDeal_df_combined <- read_csv('csv/drg_dailydeal_july_2024.csv')

Rows: 31 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): DealType, Resource
dbl  (3): ResourceAmount, ChangePercent, Credits
dttm (1): timestamp

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

missions_df_combined <- read_csv('csv/drg_missions_july_2024.csv', col_types = 'c')

dailyDeal_df_combined

# A tibble: 31 × 6
   ResourceAmount ChangePercent DealType Credits Resource   timestamp          
            <dbl>         <dbl> <chr>      <dbl> <chr>      <dttm>             
 1            138          57.0 Buy         8908 Enor Pearl 2024-07-01 00:00:00
 2            168          68.3 Buy         7982 Enor Pearl 2024-07-02 00:00:00
 3             68          49.7 Buy         5130 Croppa     2024-07-03 00:00:00
 4             98         266.  Sell       13057 Croppa     2024-07-04 00:00:00
 5             95          36.0 Buy         9120 Magnite    2024-07-05 00:00:00
 6            130          51.9 Buy         9369 Umanite    2024-07-06 00:00:00
 7             87         213.  Sell        9264 Umanite    2024-07-07 00:00:00
 8             60          44.7 Buy         4978 Jadiz      2024-07-08 00:00:00
 9             55          47.9 Buy         4294 Croppa     2024-07-09 00:00:00
10             56         152.  Sell        4258 Enor Pearl 2024-07-10 00:00:00
# ℹ 21 more rows

missions_df_combined

# A tibble: 38,521 × 12
   timestamp      PrimaryObjective SecondaryObjective Complexity Length CodeName
   <chr>          <chr>            <chr>                   <dbl>  <dbl> <chr>   
 1 2024-07-01T00… Point Extraction Glyphid Eggs                3      2 Forbidd…
 2 2024-07-01T00… Mining Expediti… ApocaBlooms                 1      2 Duplici…
 3 2024-07-01T00… Deep Scan        ApocaBlooms                 2      1 Mythic …
 4 2024-07-01T00… Mining Expediti… Glyphid Eggs                1      1 Blue Da…
 5 2024-07-01T00… Egg Hunt         ApocaBlooms                 1      1 Crumbly…
 6 2024-07-01T00… On-Site Refining Dystrum                     2      2 Frozen …
 7 2024-07-01T00… Deep Scan        Glyphid Eggs                3      2 Ancient…
 8 2024-07-01T00… Elimination      Fester Fleas                2      2 Clean D…
 9 2024-07-01T00… Escort Duty      ApocaBlooms                 2      2 Fragile…
10 2024-07-01T00… Deep Scan        Hollomite                   2      1 Awful A…
# ℹ 38,511 more rows
# ℹ 6 more variables: included_in <chr>, id <dbl>, MissionWarnings_1 <chr>,
#   MissionWarnings_2 <chr>, MissionMutator <chr>, Biomes_id <chr>

Nice. Now we have tidy data frames we can work with. The last step before it’s 100% ready to be analyzed is to clean it.

Cleaning Data

missions_df_combined_cleaned <- missions_df_combined

Non-tidy Data

In his book R for Data Science, Hadley introduces a concept of Tidy Data, which makes data analysis much easier:

Each column is a variable.

Each row is an observation.

Each cell is a single value.

Fortunately, it looks like we’ve passed 3 rules, so our data is already tidy.

Improper Column Names

It’s good enough as it is, but we can make it even better.

Inconsistencies: most of the column names are in PascalCase, but included_in is in snake_case, and timestamp is just all lowercase.
Grammatically Wrong: MissionWarnings_1, MissionWarnings_2, and Biomes_id only contain one value in each row, so they should be singular
Misc: Biomes_id just sounds weird. “id” shouldn’t be there.

missions_df_combined_cleaned <- missions_df_combined_cleaned |> 
  rename(Timestamp = timestamp,
         IncludedIn = included_in,
         Biome = Biomes_id,
         MissionWarning1 = MissionWarnings_1,
         MissionWarning2 = MissionWarnings_2)

missions_df_combined_cleaned |> 
  names()

 [1] "Timestamp"          "PrimaryObjective"   "SecondaryObjective"
 [4] "Complexity"         "Length"             "CodeName"          
 [7] "IncludedIn"         "id"                 "MissionWarning1"   
[10] "MissionWarning2"    "MissionMutator"     "Biome"

Irrelevant Data

Previously, I mentioned that we’d only analyze missions data that is usually availabe to most people. Which is the unseasoned ones. In this data, it’s the s0 from included_in. So we can filter only the rows with s0.

missions_df_combined_cleaned <- missions_df_combined_cleaned |> 
  filter(IncludedIn |> str_detect('s0'))

missions_df_combined_cleaned

# A tibble: 28,272 × 12
   Timestamp      PrimaryObjective SecondaryObjective Complexity Length CodeName
   <chr>          <chr>            <chr>                   <dbl>  <dbl> <chr>   
 1 2024-07-01T00… Point Extraction Glyphid Eggs                3      2 Forbidd…
 2 2024-07-01T00… Mining Expediti… ApocaBlooms                 1      2 Duplici…
 3 2024-07-01T00… Mining Expediti… Glyphid Eggs                1      1 Blue Da…
 4 2024-07-01T00… Egg Hunt         ApocaBlooms                 1      1 Crumbly…
 5 2024-07-01T00… On-Site Refining Dystrum                     2      2 Frozen …
 6 2024-07-01T00… Deep Scan        Glyphid Eggs                3      2 Ancient…
 7 2024-07-01T00… Elimination      Fester Fleas                2      2 Clean D…
 8 2024-07-01T00… Escort Duty      ApocaBlooms                 2      2 Fragile…
 9 2024-07-01T00… Deep Scan        Hollomite                   2      1 Awful A…
10 2024-07-01T00… Industrial Sabo… Hollomite                   2      2 Dreadfu…
# ℹ 28,262 more rows
# ℹ 6 more variables: IncludedIn <chr>, id <dbl>, MissionWarning1 <chr>,
#   MissionWarning2 <chr>, MissionMutator <chr>, Biome <chr>

Incorrect Data Types

Since we manually extracted and parsed the data from the JSON files, R didn’t assume any data type and just render everything as chr. Which mean, a lot of them needs to be changed.

Factors
Biomes, PrimaryObjective, SecondaryObjective, MissionMutator, MissionWarning1, and MissionWarning2 should be factors, since they’re categorical variables with limited set of values. Complexity and Length are a bit tricky since they’re ordinal variables, which can be treated as either qualitative or quantitative. Claude Sonnet 3.5, however, argues that since we don’t know the exact interval between 1 to 2, or 2 to 3, they should be treated as categorical. Sonnet 3.5 has helped me a lot, so I’ll trust her 🤣.

Based on the information provided, your ‘Complexity Level’ variable is an ordinal categorical variable. Even though it’s represented by numbers (1, 2, 3), it’s not truly numeric in nature, as the intervals between levels may not be equal.
Characters
CodeName can stay as chr since they don’t really hold any values other than being randomly generated names. included_in should probably be factors, but the structure isn’t standardized yet, and we don’t need to deal with them anyway.
Integers
id should be integers. Oh wait, it’s already is! I didn’t know that.
Date
This one is obvious. The Timestamp column should be in date object format. Otherwise, we can’t do any date computation with it.

missions_df_combined_cleaned <- missions_df_combined_cleaned |> 
  mutate(across(c(Biome, PrimaryObjective, SecondaryObjective, MissionMutator, MissionWarning1, MissionWarning2, Complexity, Length),
         as_factor)) |> 
  mutate(Timestamp = ymd_hms(Timestamp))

missions_df_combined_cleaned

# A tibble: 28,272 × 12
   Timestamp           PrimaryObjective    SecondaryObjective Complexity Length
   <dttm>              <fct>               <fct>              <fct>      <fct> 
 1 2024-07-01 00:00:00 Point Extraction    Glyphid Eggs       3          2     
 2 2024-07-01 00:00:00 Mining Expedition   ApocaBlooms        1          2     
 3 2024-07-01 00:00:00 Mining Expedition   Glyphid Eggs       1          1     
 4 2024-07-01 00:00:00 Egg Hunt            ApocaBlooms        1          1     
 5 2024-07-01 00:00:00 On-Site Refining    Dystrum            2          2     
 6 2024-07-01 00:00:00 Deep Scan           Glyphid Eggs       3          2     
 7 2024-07-01 00:00:00 Elimination         Fester Fleas       2          2     
 8 2024-07-01 00:00:00 Escort Duty         ApocaBlooms        2          2     
 9 2024-07-01 00:00:00 Deep Scan           Hollomite          2          1     
10 2024-07-01 00:00:00 Industrial Sabotage Hollomite          2          2     
# ℹ 28,262 more rows
# ℹ 7 more variables: CodeName <chr>, IncludedIn <chr>, id <dbl>,
#   MissionWarning1 <fct>, MissionWarning2 <fct>, MissionMutator <fct>,
#   Biome <fct>

Duplicate Data

We also need to scan for duplicates. Assuming the extractor script is robust, there shouldn’t be any. But Just in case, we should still check.

First, we can check the frequency count of each id, since it’s supposed to be the unique indentifier.

missions_df_combined_cleaned |> 
  count(id, sort = TRUE)

# A tibble: 28,272 × 2
      id     n
   <dbl> <int>
 1  9722     1
 2  9723     1
 3  9725     1
 4  9726     1
 5  9727     1
 6  9728     1
 7  9729     1
 8  9730     1
 9  9731     1
10  9732     1
# ℹ 28,262 more rows

There’s no duplicate id, great. The next column we could check is the codeName

missions_df_combined_cleaned |> 
  count(CodeName, sort = TRUE)

# A tibble: 20,867 × 2
   CodeName             n
   <chr>            <int>
 1 Corrupt Trail        8
 2 Gangrenous End       7
 3 Rejected Shelter     7
 4 Burning Comeback     6
 5 Burning Desert       6
 6 Sharp Trench         6
 7 True Ravine          6
 8 Wild Anvil           6
 9 Wild Edge            6
10 Wild End             6
# ℹ 20,857 more rows

There’s quite several repetitions. However, as long as they’re in different time, it’s safe. So that’s our third comparison: timestamp

missions_df_combined_cleaned |> 
  count(CodeName, Timestamp, sort = TRUE)

# A tibble: 28,272 × 3
   CodeName          Timestamp               n
   <chr>             <dttm>              <int>
 1 Abandoned Abyss   2024-07-10 01:30:00     1
 2 Abandoned Agony   2024-07-06 17:00:00     1
 3 Abandoned Arm     2024-07-13 14:30:00     1
 4 Abandoned Barrens 2024-07-04 17:30:00     1
 5 Abandoned Base    2024-07-13 22:30:00     1
 6 Abandoned Base    2024-07-14 07:00:00     1
 7 Abandoned Basin   2024-07-20 22:30:00     1
 8 Abandoned Bed     2024-07-12 20:30:00     1
 9 Abandoned Bedrock 2024-07-10 11:30:00     1
10 Abandoned Benefit 2024-07-31 10:00:00     1
# ℹ 28,262 more rows

Yup, all unique. We could also pick one codename and see its time pattern:

missions_df_combined_cleaned |> 
  count(CodeName, Timestamp, sort = TRUE) |> 
  filter(CodeName == 'Corrupt Trail')

# A tibble: 8 × 3
  CodeName      Timestamp               n
  <chr>         <dttm>              <int>
1 Corrupt Trail 2024-07-09 12:30:00     1
2 Corrupt Trail 2024-07-09 15:00:00     1
3 Corrupt Trail 2024-07-11 03:00:00     1
4 Corrupt Trail 2024-07-15 00:00:00     1
5 Corrupt Trail 2024-07-16 04:00:00     1
6 Corrupt Trail 2024-07-25 00:00:00     1
7 Corrupt Trail 2024-07-27 13:00:00     1
8 Corrupt Trail 2024-07-31 04:00:00     1

No particular pattern visible, but we can see they’re on different days. Only one on the same day, but with 3 hours difference.

Alright, all set. We’re good to go.

Analyzing Data

Mission Pattern

Overall Distribution

Before we analyze the pattern of Double XP’s mission mutator, let’s first see the overall distribution of mission mutators

missions_df_combined_cleaned |> 
  ggplot(aes(MissionMutator)) +
  geom_bar()

I’m not sure why NA values are also visualized, since they’re usually implicitly removed.

missions_df_combined_cleaned |> 
  select(MissionMutator) |> 
  filter(!is.na(MissionMutator)) |> 
  ggplot(aes(MissionMutator)) +
  geom_bar()

Now the NA values are removed, the labels are hard to read due to how many it is. Let’s try mitigating this by wrapping the labels to newlines

missions_df_combined_cleaned |> 
  select(MissionMutator) |> 
  filter(!is.na(MissionMutator)) |> 
  ggplot(aes(MissionMutator)) +
  geom_bar() +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5))

Now it looks much better. We can reorder the bar descendingly with fct_infreq() to see what’s the highest and lowest, but honestly, they all don’t seem much differ in values, except for Blood Sugar.

missions_df_combined_cleaned |> 
  select(MissionMutator) |> 
  filter(!is.na(MissionMutator)) |> 
  ggplot(aes(fct_infreq(MissionMutator))) +
  geom_bar() +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5))

See? Well, I guess this is to be expected as it’s generated by computers 😅.

Double XP Pattern

Finally, we reach the main objective.

Since we have the complete data with complete timestamp for the entire month, there’s various ways we can visualize the pattern.

Hourly
Daily
Weekly

To make our analysis easier, we need to extract each component of the date from the Timestamp, which are: week days (names), days (numbers), and hours

double_xp_trend <- missions_df_combined_cleaned |> 
  mutate(WeekDay = wday(Timestamp, label=TRUE),
         Day = day(Timestamp),
         Hour = hour(Timestamp),
         .after = Timestamp) |> 
  filter(MissionMutator == 'Double XP')

double_xp_trend

# A tibble: 613 × 15
   Timestamp           WeekDay   Day  Hour PrimaryObjective   SecondaryObjective
   <dttm>              <ord>   <int> <int> <fct>              <fct>             
 1 2024-07-01 03:00:00 Mon         1     3 Elimination        Hollomite         
 2 2024-07-01 03:30:00 Mon         1     3 On-Site Refining   Gunk Seeds        
 3 2024-07-01 04:00:00 Mon         1     4 Point Extraction   Dystrum           
 4 2024-07-01 05:00:00 Mon         1     5 Mining Expedition  Bha Barnacles     
 5 2024-07-01 06:30:00 Mon         1     6 Deep Scan          Fossils           
 6 2024-07-01 10:30:00 Mon         1    10 Mining Expedition  Ebonuts           
 7 2024-07-01 10:30:00 Mon         1    10 Elimination        Ebonuts           
 8 2024-07-01 10:30:00 Mon         1    10 Industrial Sabota… Hollomite         
 9 2024-07-01 11:00:00 Mon         1    11 Mining Expedition  Gunk Seeds        
10 2024-07-01 11:30:00 Mon         1    11 Deep Scan          Fester Fleas      
# ℹ 603 more rows
# ℹ 9 more variables: Complexity <fct>, Length <fct>, CodeName <chr>,
#   IncludedIn <chr>, id <dbl>, MissionWarning1 <fct>, MissionWarning2 <fct>,
#   MissionMutator <fct>, Biome <fct>

Weekly

Let’s compute the average occurrence per day. We can obtain it by first computing the total count each day, and then get the mean from them.

double_xp_trend_weekly <- double_xp_trend |> 
  count(Day, WeekDay) |> 
  group_by(WeekDay) |> 
  summarize(mean = mean(n))

double_xp_trend_weekly

# A tibble: 7 × 2
  WeekDay  mean
  <ord>   <dbl>
1 Sun      21.2
2 Mon      18.6
3 Tue      20.2
4 Wed      19  
5 Thu      17  
6 Fri      20.2
7 Sat      22.5

It looks like there isn’t much difference between the day. We can verify it by checking the SD.

double_xp_trend_weekly |> 
  pull(mean) |> 
  sd()

[1] 1.810584

Yeah, that’s minuscule. But still, let’s visualize it to get a better picture.

double_xp_trend_weekly |>  
  ggplot(aes(WeekDay, mean, group = 1)) +  
  geom_line() +
  geom_label(aes(label = mean)) +
  labs(x='Week Day', y='Averge Occurence (mean)', title = 'Average Occurence (Mean) of Double XP Each Day')

So the peak is on Saturday and the trough is on Thursday. Still, if you think about it as a gamer, it really isnt’ much of a difference, except maybe for Thursday. 20 Double XP a day is a lot. One game can range from 30 to 60 minutes. Even if you stay up all day, you can’t get all of them anyway.

Daily

double_xp_trend |> 
  count(Day, WeekDay) |> 
  ggplot(aes(Day, n)) +
  geom_line() +
  scale_x_continuous(breaks = c(1:31)) +
  labs(title='Daily Occurences of Double XP')

When we zoom out and see the pattern throughout the month, we can see a repeating up and down cycle that’s going down nearing the end of the month. I wish we have data from the other months to complete the pattern.

Hourly

double_xp_trend |> 
  count(Day, Hour) |> 
  group_by(Hour) |> 
  summarize(mean = mean(n))

# A tibble: 24 × 2
    Hour  mean
   <int> <dbl>
 1     0  1.64
 2     1  1.8 
 3     2  1.31
 4     3  1.43
 5     4  1.53
 6     5  1.45
 7     6  1.35
 8     7  1.44
 9     8  1.23
10     9  1.53
# ℹ 14 more rows

double_xp_trend |> 
  count(Hour) |> 
  ggplot(aes(Hour, n)) +
  geom_line() +
  scale_x_continuous(breaks = c(0:23))

If we zoom in on the hours and sum all the occurrences on each hour, we can see that the peak is on 10:00 UTC. But is it really that much of a difference compared to the rest? As we did previously, we can use the mean instead to see how many occurence is it on average.

double_xp_trend_hourly <- double_xp_trend |> 
  count(Day, Hour) |> 
  group_by(Hour) |> 
  summarize(mean = mean(n))

double_xp_trend_hourly |>
  ggplot(aes(Hour, mean)) +
  geom_line() +
  scale_x_continuous(breaks = 0:23)

Ohhh, that’s unexpected. So on average, the highest is not on 10:00 UTC, but on 01:00 UTC, which was quite low when we see the the total count. This might indicate a presence of outliers.

Anyway, it doesn’t matter. The range is only about 1 difference, from 1 to 2. That’s meaningless when you consider the game duration per mission. Once you finish one, a new mission will be generated.

Summary

Insight 1: Double XP Pattern

Statistically, there are trends and pattern of peaks and troughs, of when Double XP most likely to happen.

Unfortunately, from the gamer’s perspective, the difference is not significant enough. On Weekly difference, you don’t play game all day. On hourly difference, a new mission will be generated once you played.

In short, there is no strategy that we, as the gamers, can use to find the right time to play. Fortunately, since we have the complete timestamps, we can use it to instead snipe the Double XP missions directly.

Insight 2: Mutators Occurrences

There’s another interesting insight, though. The hourly average is about 1.5. However, we know that the game mission refreshes every 30 minutes. And there’s like 20 missions at once. Which mean, on average, there can only be 1 Double XP mission in every cycle.

But remember, we have filtered the data to only include missions that are Double XP. If we have included all mutators, it’s highly possible they have similar pattern, considering the frequency of all mutators are highly similar.

Appendix: Other Findings

Below, you can find other findings that aren’t related to Double XP, but might be interesting to know.

Missions

missions_df_combined_cleaned

# A tibble: 28,272 × 12
   Timestamp           PrimaryObjective    SecondaryObjective Complexity Length
   <dttm>              <fct>               <fct>              <fct>      <fct> 
 1 2024-07-01 00:00:00 Point Extraction    Glyphid Eggs       3          2     
 2 2024-07-01 00:00:00 Mining Expedition   ApocaBlooms        1          2     
 3 2024-07-01 00:00:00 Mining Expedition   Glyphid Eggs       1          1     
 4 2024-07-01 00:00:00 Egg Hunt            ApocaBlooms        1          1     
 5 2024-07-01 00:00:00 On-Site Refining    Dystrum            2          2     
 6 2024-07-01 00:00:00 Deep Scan           Glyphid Eggs       3          2     
 7 2024-07-01 00:00:00 Elimination         Fester Fleas       2          2     
 8 2024-07-01 00:00:00 Escort Duty         ApocaBlooms        2          2     
 9 2024-07-01 00:00:00 Deep Scan           Hollomite          2          1     
10 2024-07-01 00:00:00 Industrial Sabotage Hollomite          2          2     
# ℹ 28,262 more rows
# ℹ 7 more variables: CodeName <chr>, IncludedIn <chr>, id <dbl>,
#   MissionWarning1 <fct>, MissionWarning2 <fct>, MissionMutator <fct>,
#   Biome <fct>

missions_df_other_findings <- missions_df_combined_cleaned |> 
  mutate(WeekDay = wday(Timestamp, label=TRUE),
         Day = day(Timestamp),
         Hour = hour(Timestamp),
         .after = Timestamp)

Biome

Code

missions_df_other_findings |>
  count(Day, Biome, name = 'Total_Occurences') |> 
  group_by(Biome) |> 
  summarize(Daily_Occurence = mean(Total_Occurences)) |> 
  arrange(desc(Daily_Occurence)) |> 
  ggplot(aes(Daily_Occurence, fct_rev(fct_infreq(Biome, Daily_Occurence)))) +
  geom_col() +
  labs(title = 'Most Common Biomes', subtitle = 'Based on its Mean of Daily Occurences', x = 'Daily Occurence', y = 'Biomes')

Primary Objective

Code

missions_df_other_findings |> 
  count(Day, PrimaryObjective, name = 'Total_Occurences') |> 
  group_by(PrimaryObjective) |> 
  summarize(Daily_Occurence = mean(Total_Occurences)) |> 
  ggplot(aes(fct_infreq(PrimaryObjective, Daily_Occurence), Daily_Occurence)) +
  geom_col() +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5)) +
  labs(title = 'Most Common Primary Objectives', subtitle = 'Based on its Mean of Daily Occurences', x = 'Primary Objective', y = 'Daily Occurence')

Secondary Objective

Code

missions_df_other_findings |>
  mutate(SecondaryObjective = str_replace(SecondaryObjective, 'ApocaBlooms', 'Apoca Bloom')) |> 
  count(Day, SecondaryObjective, name = 'Total_Occurences') |> 
  group_by(SecondaryObjective) |> 
  summarize(Daily_Occurence = mean(Total_Occurences)) |> 
  ggplot(aes(fct_rev(fct_infreq(SecondaryObjective, Daily_Occurence)), Daily_Occurence)) +
  geom_col() +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5)) +
  labs(title = 'Most Common Secondary Objectives', subtitle = 'Based on its Mean of Daily Occurences', x = 'Secondary Objective', y = 'Daily Occurence')

Mission Warnings

Primary Warning

Code

missions_df_other_findings |>
  filter(!is.na(MissionWarning1)) |> 
  count(Day, MissionWarning1, name = 'Total_Occurences') |> 
  group_by(MissionWarning1) |> 
  summarize(Daily_Occurence = mean(Total_Occurences)) |> 
  arrange(desc(Daily_Occurence)) |> 
  head(5) |> 
  ggplot(aes(Daily_Occurence, fct_rev(fct_infreq(MissionWarning1, Daily_Occurence)))) +
  geom_col() +
  labs(title = 'Top Five Most Common Frimary Mission Warnings', subtitle = 'Based on its Mean of Daily Occurences', x = 'Daily Occurence', y = 'Primary Mission Warnings')

Code

missions_df_other_findings |>
  filter(!is.na(MissionWarning1)) |> 
  count(Day, MissionWarning1, name = 'Total_Occurences') |> 
  group_by(MissionWarning1) |> 
  summarize(Daily_Occurence = mean(Total_Occurences)) |> 
  arrange(Daily_Occurence) |> 
  head(5) |> 
  ggplot(aes(Daily_Occurence, fct_infreq(MissionWarning1, Daily_Occurence))) +
  geom_col() +
  labs(title = 'Top Five Rarest Primary Mission Warnings', subtitle = 'Based on its Mean of Daily Occurences', x = 'Daily Occurence', y = 'Primary Mission Warnings')

Mission Warnings Combinations

Top 10

Code

missions_df_other_findings |> 
  filter(!is.na(MissionWarning1), !is.na(MissionWarning2)) |> 
  count(MissionWarning1, MissionWarning2, name = 'Total_Occurence') |> 
  arrange(desc(Total_Occurence))

# A tibble: 91 × 3
   MissionWarning1      MissionWarning2      Total_Occurence
   <fct>                <fct>                          <int>
 1 Elite Threat         Regenerative Bugs                 28
 2 Duck and Cover       Regenerative Bugs                 27
 3 Exploder Infestation Lethal Enemies                    27
 4 Duck and Cover       Exploder Infestation              26
 5 Elite Threat         Parasites                         25
 6 Parasites            Swarmageddon                      25
 7 Shield Disruption    Swarmageddon                      24
 8 Regenerative Bugs    Swarmageddon                      24
 9 Elite Threat         Mactera Plague                    24
10 Mactera Plague       Parasites                         24
# ℹ 81 more rows

Lowest 10

Code

missions_df_other_findings |> 
  filter(!is.na(MissionWarning1), !is.na(MissionWarning2)) |> 
  count(MissionWarning1, MissionWarning2, name = 'Total_Occurence') |> 
  arrange(Total_Occurence)

# A tibble: 91 × 3
   MissionWarning1    MissionWarning2      Total_Occurence
   <fct>              <fct>                          <int>
 1 Cave Leech Cluster Haunted Cave                       5
 2 Haunted Cave       Rival Presence                     6
 3 Haunted Cave       Shield Disruption                  7
 4 Haunted Cave       Low Oxygen                         8
 5 Ebonite Outbreak   Haunted Cave                       8
 6 Duck and Cover     Ebonite Outbreak                   9
 7 Elite Threat       Rival Presence                     9
 8 Ebonite Outbreak   Rival Presence                     9
 9 Ebonite Outbreak   Exploder Infestation               9
10 Low Oxygen         Mactera Plague                     9
# ℹ 81 more rows

Daily Deal

dailyDeal_df_combined

# A tibble: 31 × 6
   ResourceAmount ChangePercent DealType Credits Resource   timestamp          
            <dbl>         <dbl> <chr>      <dbl> <chr>      <dttm>             
 1            138          57.0 Buy         8908 Enor Pearl 2024-07-01 00:00:00
 2            168          68.3 Buy         7982 Enor Pearl 2024-07-02 00:00:00
 3             68          49.7 Buy         5130 Croppa     2024-07-03 00:00:00
 4             98         266.  Sell       13057 Croppa     2024-07-04 00:00:00
 5             95          36.0 Buy         9120 Magnite    2024-07-05 00:00:00
 6            130          51.9 Buy         9369 Umanite    2024-07-06 00:00:00
 7             87         213.  Sell        9264 Umanite    2024-07-07 00:00:00
 8             60          44.7 Buy         4978 Jadiz      2024-07-08 00:00:00
 9             55          47.9 Buy         4294 Croppa     2024-07-09 00:00:00
10             56         152.  Sell        4258 Enor Pearl 2024-07-10 00:00:00
# ℹ 21 more rows

Overall Trend

Code

dailyDeal_df_combined |> 
  ggplot(aes(timestamp,Credits, colour = DealType)) +
  geom_line() +
  labs(title = 'Trend of Daily Deal over July 2024', color='Deal Type', x='Date', y='Credits Offered')

Common Resource on Sale

Code

dailyDeal_df_combined |> 
  count(Resource) |> 
  ggplot(aes(fct_infreq(Resource, n), n)) +
  geom_col() +
  labs(title = 'Most Common Resource on Sale', subtitle = 'Based on total occurences over July 2024', x = 'Resource', y = 'Total Occurence')