top of page

Companies House Charts (Day 2 of 5)

My exploratory data analysis of Companies House data in August 2020:

Number of active companies in top 9 SIC codes

(ranked by max-min spread)



Next, the limitations:

  • SICCode.SicText_1 only


And the code snippet:

#split out top n SICCode.SicText_1 by max-min spread
data %>%
group_by(path_ym,SICCode.SicText_1) %>%
summarise(.groups = "keep",
  count = n(),
) %>%
ungroup() %>%
group_by(SICCode.SicText_1) %>%
mutate(
  spread_count = max(count)-min(count)
) %>%
ungroup() %>%
mutate(SICCode.SicText_1 = as_factor(SICCode.SicText_1) %>% fct_lump(n=n,w=spread_count)) %>%
filter(SICCode.SicText_1!="Other") %>%
mutate(SICCode.SicText_1 = SICCode.SicText_1 %>% fct_reorder2(path_ym,count)) %>%
mutate(path_ym = ymd(path_ym)) %>%
select(-spread_count) %>%
arrange(desc(SICCode.SicText_1)) %>%
#graph
ggplot(aes(x=path_ym,y=count,color=SICCode.SicText_1)) +
geom_line() +
facet_wrap(~SICCode.SicText_1,scales = "free_y") +
#formatting
labs(
  title = "",
  x="",y=""
) +
theme_tq() +
theme(
  legend.position = "none"
)



Recent Posts

See All

Improving Excel with Python (May 2022)

Revisited starter script from January 2021: Split Excel file into separate files Excel is essential, and Python is the future - forcing ourselves to practice the latter by automating some of the commo

Message us or

Call us on +44 (0)20 3287 8283

Mon to Fri: 8am-8pm

Weekends: 10am-6pm

bottom of page