Multimatics Insight

Cybersecurity Data Science Framework: Harnessing Advanced Cyber Analytics and Protection

Cybersecurity Data Science Framework: Harnessing Advanced Cyber Analytics and Protection

Over the past decades, many researchers and practitioners have been applying data science techniques to security challenges. There are now thousands of papers using machine learning, statistical learning, data mining, and natural language processing techniques for complex security challenges such as intrusion detection, malware detection, phishing, and denial of service attacks. In addition, cybersecurity also become a major concern for organizations, partly because of the destructive spread of incidents in cyberattacks, such as data breaches at Equifax, Verizon, Gmail, and Instagram. In 2022, Indonesia also faced several data breach cases that spread within major organizations, both commercial and governmental, such as PLN, Telkom, and Kominfo, with more than 20 million user data reported to be sold illegally on the internet.

The vast majority of companies do not even reveal that they have been attacked because they fear lawsuits, damage to their reputation and stock prices, and loss of consumers (Verma, 2018). Other cybersecurity threats that become major concerns include data overload, flooding false alerts, unknown unknowns, limited resources, and difficulties in paving integration & orchestration. Furthermore, attackers constantly adapt to detection methods and actively seek to exploit new vulnerabilities. With the complexity and dynamic nature of cyber threats, data-driven analytics automation in cyberthreats intelligence is very desirable.

Recently, predicting cybersecurity events has received increasing attention (Sun et al. 2018; Soska and Christin 2014; Sapienza et al. 2018). Cybersecurity emerges to overcome challenging conditions and prevent potential threats that could harm organizations' assets. Sharing of data and knowledge is important for moving forward quickly but has been severely constrained in such a setting. Hence, the application of data science and cybersecurity can't be easily ignored or even put on hold. The unique domain of cybersecurity data science needs continual improvement to produce highly cybersecurity intelligence and correspond to the business' main objectives.

Data Science offers methods to support cybersecurity assurance goals, spanning security threat identification, protection, detection, response, and recovery (NIST 2018).

Cybersecurity Data Science: The Key Concept

The key concept of Cybersecurity Data Science (CSDS) is applied data science method which include data engineering, reduced data volumes, and discovery & detection, while optimizing digital infrastructure. The goal of CSDS is to optimize cybersecurity operations by focusing on data, applying quantitative, algorithmic, and probabilistic methods, trying to quantify risk, providing targeted and effective alerts, and promoting inferential methods to classify behavioral patterns. CSDS method thus derive from core data science activities, including analytics problem framing, exploratory data analytics, visualization, diagnostics, data preparation, data engineering, statistical analysis, feature engineering, machine learning, optimization, semantic analytics, and ensuring scientific rigor in data-focused inquiry (Davenport 2013; Kelleher and Tierney 2018; Sallam and Cearley 2012).

According to Sarker, et. al (2020), CSDS is security data-focused, applies machine learning methods to quantify cyber risks, and ultimately seeks to optimize cybersecurity operations. Based on multiple sources that proposed CSDS definitions, it can be derived as the practice of data science to assure the continuity of digital devices systems, services, software, and agents in pursuit of the stewardship of systemic cybersphere stability, spanning technical, operational, organizational, economic, social, and political contexts.

CSDS consists of several main phases - security data collecting, data preparation, machine learning based security modeling, and incremental learning and dynamism for smart cybersecurity systems and services (Sarker, 2020). When it comes to cybersecurity, CSDS enables organizations to conduct hybrid data gathering from relevant sources such as network activity, database activity, application activity, or user activity, and the analytics complement the latest data-driven patterns for providing corresponding security solutions. Datasets, on which cybersecurity data science is founded, are often collections of information records that include a variety of attributes or features and related facts.

Overall, CSDS concerns with understanding diverse cyber-attacks and devising corresponding defense strategies that preserve several properties such as confidentiality, integrity, and availability.

However, CSDS as a hybrid domain presently lacks a codified body of theory to guide the systematic diagnosis of security problems leading to design prescription. While cybersecurity as a parent domain of CSDS is highly challenged by rapidly evolving threats and vulnerabilities due to proliferation and growing complexity of digital infrastructure, data science as the accompanying CSDS parent domain often lack of codified theoretical and methodological consensus due to a lack of resolute disciplinary boundaries. In order to overcome aforementioned issues, several attempts can be done including improve data tasks by focused data engineering on practical applications of data gathering and analysis, implement targeted security alerts focusing on knowledge alerts to minimize false alerts, and achieve resource optimizations.

The emerging profession of Cybersecurity Data Science Practitioners

Nowadays, CSDS has been recognized as an emerging hybrid profession. The emerging CSDS professional domain is characterized by newness, rapid change, and complexity, scanning technical, methodological, and organizational contexts (Mongeau, S., Hajdasinski, A. 2021). Being CSDS practitioners, the profession perceived importance of various security analytics goals, with detecting events in progress and determining the cause of past events that supports forensic investigation when necessary.

The overview of CSDS professional functions includes, but not limited to:

1. Gather and ensure data quality to perform statistical and diagnostic tests.

2. Conduct assistance in designing, testing, and deployment of data pipelines (i.e., ETL processes).

3. Supply cybersecurity professionals with informative and effactious "detective analytics" results, aggregating explanatory and predictive indications.

4. Collaborate cross-functionality to enable rapid predictive detection remediation.

5. Determine and validate CSDS organizational objectives working with a range of stakeholders.

With such high demand for CSDS professionals, many firms are required to employ consultants to help the different businesses that require cybersecurity data science with these necessary demands in order to meet them. Consultants bring a lot of information and expertise to the table. In applying CSDS, many practitioners find it challenging to optimize available tools achieve specific goals. For example, white hat tools (i.e Pentest) often quickly end up being repurposed for black hat purposes. There is also found adversarial machine learning practice, which is a reverse engineering and confusing/tricking machine learning models that intend to seeding system with false data. To start a CSDS profession, the framing and hosting of targeted training and credential programs is suggested for the professionalization of the CSDS domain. In this sense, there would be a lot of benefit from the research, instruction, and curriculum creation on this subject.

The Future Path of CSDS

From the organization perspectives, several obstacles that may be encounter when applying CSDS are confusion, regulation uncertainty, marketing hype, and few resources of CSDS practitioners or professionals. Uncertain regulation between organizations and related stakeholders will complicate data analytics process and decrease security system and data integration. From the process perspective, several obstacles in applying CSDS are inherent costs, false alerts volume, decision uncertainty, and scientific process. From the technology perspectives, the obstacles are include data preparation, normal vs anomalous condition, whether the organization has own their infrastructure or shadow IT, and lack of labeled incidents. These obstacles in the organizations can be resolved by establishing management-driven change and conducting training & program governance.

Organizations can also improve organizational process engineering, structured risk quantification, and develop focused scientific processes. A follow-on action implementation could be a detailed working out of CSDS data preparation and exploration best practices in an operational setting. An improved grasp of the current issues and best practices in the CSDS area lays the groundwork for the development of the domain.


Analyzing cybersecurity data and building the right tools and processes to successfully protect against cybersecurity incidents goes beyond a simple set of functional requirements and knowledge about risks, threats or vulnerabilities (Sarker, 2020). CSDS applies to data-driven intelligent decision making in smart cybersecurity systems and services in this research in light of the expanding significance of cybersecurity, data science, and machine learning technologies. Organizations aim to address expanding security exposures as the best investment between opportunity and expected risks in the context of bettering business decision-making.

The output of CSDS can be used in many application areas such as Internet of things (IoT) security, network security, cloud security, mobile and web applications, and other relevant cyber areas. (Abomhara M, et al. 2015.; Helali RGM. 2010,; Ryoo J, Rizvi S, Aiken W, Kissell J. 2013,; Jang-Jaccard J, Nepal S. 2014). The costs and benefits of hosting internal CSDS operations vs hiring a managed service or hiring someone to implement a solution, which may involve a combination of consulting and software tool provision, are a key question about the future of the CSDS domain.

Abomhara M, et al. (2015) Cyber security and the internet of things: vulnerabilities, threats, intruders and attacks. J Cyber Secur Mob.;4(1):65–88.
Densham B. (2015) Three cyber-security strategies to mitigate the impact of a data breach. Netw Secur. 2015;(1):5–8.

Share this on:

Scroll to Top