Technical risk management

Rajesh Uppal July 4, 2022 Industry & Market Dynamics, Manufacturing, Technology & Systems Management Comments Off on Technical risk management 1,419 Views

All organizations face uncertainty. The effect this uncertainty has on an organization’s objectives is referred to as “risk.” The challenge for management is to determine how much uncertainty or risk to accept and how to manage it to an acceptable level.

Risk management includes the strategies, processes, systems, and people directed toward the effective management of potential threats and opportunities. The goal of risk management is to provide stakeholders with reasonable assurance that your organization’s objectives will be achieved, opportunities will be identified and seized, and future risk response decisions are appropriate.

It is important to understand the organization’s objectives, philosophy and culture, strategies, and internal and external SWOTs (strengths, weaknesses, opportunities, and threats) to fully understand the potential risks and impact.

Most companies are much better at introducing new technologies than retiring them. The cost of running unsupported technology can be high. Costs of IT outages and data breaches run into the millions. At the end-of-life of technology, IT management has to deal with challenges such as integration issues, limited functionality, low service levels, lack of available skills, and missing support from vendors.

The Technical Risk Management Process is one of the crosscutting technical management processes. Risk is the potential for performance shortfalls, which may be realized in the future, with respect to achieving explicitly established and stated performance requirements.

The performance shortfalls may be related to institutional support for mission execution or related to any one or more of the following mission execution domains: safety, technical, cost, and schedule. Systems engineers are involved in this process to help identify potential technical risks, develop mitigation plans, monitor the progress of the technical effort to determine if new risks arise or old risks can be retired, and be available to answer questions and resolve issues.

Risk is characterized by three basic components:

The scenario(s) leading to degraded performance with respect to one or more performance measures (e.g., scenarios leading to injury, fatality, destruction of key assets; scenarios leading to exceedance of mass limits; scenarios leading to cost overruns; scenarios leading to schedule slippage);
The likelihood(s) (qualitative or quantitative) of those scenario(s); and
The consequence(s) (qualitative or quantitative severity of the performance degradation) that would result if the scenario(s) was (were) to occur.

Scenario: A sequence of credible events that specifies the evolution of a system or process from a given state to a future state. In the context of risk management, scenarios are used to identify the ways in which a system or process in its current state can evolve to an undesirable state.

Undesired scenario(s) might come from technical or programmatic sources (e.g., a cost overrun, schedule slippage, safety mishap, health problem, malicious activities, environmental impact, or failure to achieve a needed scientific or technological objective or success criterion). Both the likelihood and consequences may have associated uncertainties.

Uncertainties are included in the evaluation of likelihoods and consequences.

Cost Risk: This is the risk associated with the ability of the program/project to achieve its life-cycle cost objectives and secure appropriate funding. Two risk areas bearing on cost are (1) the risk that the cost estimates and objectives are not accurate and reasonable; and (2) the risk that program execution will not meet the cost objectives as a result of a failure to handle cost, schedule, and performance risks.

Schedule Risk: Schedule risks are those associated with the adequacy of the time estimated and allocated for the development, production, implementation, and operation of the system. Two risk areas bearing on schedule risk are (1) the risk that the schedule estimates and objectives are not realistic and reasonable; and (2) the risk that program execution will fall short of the schedule objectives as a result of failure to handle cost, schedule, or performance risks.

Programmatic Risk: This is the risk associated with action or inaction from outside the project, over which the project manager has no control, but which may have significant impact on the project. These impacts may manifest themselves in terms of technical, cost, and/or schedule. This includes such activities as: International Traffic in Arms Regulations (ITAR), import/export control, partner agreements with other domestic or foreign organizations, congressional direction or earmarks, Office of Management and Budget (OMB) direction, industrial contractor restructuring, external organizational changes, etc.

Technical Risk

This is the risk associated with the evolution of the design and the production of the system of interest affecting the level of performance necessary to meet the stakeholder expectations and technical requirements. The design, test, and production processes (process risk) influence the technical risk and the nature of the product as depicted in the various levels of the PBS (product risk).

Consider the following:

Disaster preparedness and recovery
Data security
Information privacy
Compliance (with laws and regulations)
System Development Life Cycle (software development) projects
Large-scale system implementation and integration projects
Management of vendor/servicer arrangements

Risks in Space Industry

In a space mission, a risk is a potential failure that can take place during the design, build, transportation, launch or operation of a spacecraft on orbit. During the period up until launch a failure can result in cost overruns, schedule delays and potentially the loss of a critical function of the spacecraft.

During and after launch, a failure can result in the partial or complete loss of the spacecraft or a function of the spacecraft. Depending on the severity, this can lead to a total loss of mission or a reduction in the performance or lifetime of the spacecraft.

There is a large increase in the number of satellites on orbit. This certainly increases the risk of orbital conjunctions or collisions. At the same time NewSpace satellites tend to fly in lower orbits and reenter in a much shorter period.

In general, in the CubeSat domain there is a higher chance of satellites failing on orbit due to reduced design review and analysis and in many cases lack of redundancy

For us as a new company, risks that materialize can lead to the death of the company. We also need to move very fast because the market is rapidly changing at the moment. I have asked my team to prioritize schedule and reliability over performance. We can rather develop performance incrementally over multiple future versions with an iterative design philosophy, says Bryan Dean, CEO of Dragonfly Aerospace.

In terms of identifying risks, our team focuses on part stress analysis, failure mode analysis and identifying single point failures. In general, we apply a dual redundancy policy where possible and try to remove any single point failures. Where we can remove single point failures then we try to have graceful degradation.

Risk: Single Event Effects: SEU, SEL, SEGR. Traditional approach to use expensive radiation hardened electronic components and materials – often with reduced performance and increased mass

Opportunity: Commercial component approach. Make use of the latest advances in electronics and materials. Reduced cost, Increased performance, higher availability

Operational Risk. Understand the operational nature of the capabilities you are supporting and the risk to the end users, their missions, and their operations of the capabilities. Understanding of the operational need/mission will help you appreciate the gravity of risks and the impact they could have to the end users. This is a critical part of risk analysis realizing the real-world impact that can occur if a risk arises during operational use. Typically operational users are willing to accept some level of risk if they are able to accomplish their mission (e.g., mission assurance), but you need to help users to understand the risks they are accepting and to assess the options, balances, and alternatives available.

Technical maturity. Work with and leverage industry and academia to understand the technologies being considered for an effort and the likely transition of the technology over time. One approach is to work with vendors under a non-disclosure agreement to understand the capabilities and where they are going, so that the risk can be assessed.

Non-Developmental Items (NDI). NDI includes commercial-off-the-shelf and government-off-the-shelf items. To manage risk, consider the viability of the NDI provider. Does the provider have market share? Does the provider have appropriate longevity compared to its competitors? How does the provider address capability problems and release fixes, etc.? What is the user base for the particular NDI? Can the provider demonstrate the NDI, preferably in a setting similar to that of your customer? Can the government use the NDI to create a prototype? All of these factors will help assess the risk of the viability of the NDI and the provider.

Acquisition drivers. Emphasize critical capability enablers, particularly those that have limited alternatives. Evaluate and determine the primary drivers to an acquisition and emphasize their associated risk in formulating risk mitigation recommendations. If a particular aspect of a capability is not critical to its success, its risk should be assessed, but it need not be the primary focus of risk management. For example, if there is risk to a proposed user interface, but the marketplace has numerous alternatives, the success of the proposed approach is probably less critical to overall success of the capability. On the other hand, an information security feature is likely to be critical. If only one alternative approach satisfies the need, emphasis should be placed on it. Determine the primary success drivers by evaluating needs and designs, and determining the alternatives that exist. Is a unique solution on the critical path to success? Make sure critical path analyses include the entire system engineering cycle and not just development (i.e., system development, per se, may be a “piece of cake,” but fielding in an active operational situation may be a major risk)

External and internal dependencies. Having an enterprise perspective can help acquirers, program managers, developers, integrators, and users appreciate risk from dependencies of a development effort. With the emergence of service-oriented approaches, a program will become more dependent on the availability and operation of services provided by others that they intend to use in their program’s development efforts. Work with the developers of services to ensure quality services are being created, and thought has been put into the maintenance and evolution of those services. Work with the development program staff to assess the services that are available, their quality, and the risk that a program has in using and relying upon the service. Likewise, there is a risk associated with creating the service and not using services from another enterprise effort. Help determine the risks and potential benefits of creating a service internal to the development with possibly a transition to the enterprise service at some future time.

Integration and Interoperability (I&I). I&I will almost always be a major risk factor. They are forms of dependencies in which the value of integrating or interoperating has been judged to override their inherent risks. Techniques such as enterprise federation architecting, composable capabilities on demand, and design patterns can help the government plan and execute a route to navigate I&I risks.

Information security. Information security is a risk in nearly every development. Some of this is due to the uniqueness of government needs and requirements in this area. Some of this is because of the inherent difficulties in countering cyber attacks. Creating defensive capabilities to cover the spectrum of attacks is challenging and risky. Help the government develop resiliency approaches (e.g., contingency plans, backup/recovery, etc.). Enabling information sharing across systems in coalition operations with international partners presents technical challenges and policy issues that translate into development risk. The information security issues associated with supply chain management is so broad and complex that even maintaining rudimentary awareness of the threats is a tremendous challenge.

Skill level. The skill or experience level of the developers, integrators, government, and other stakeholders can lead to risks. Be on the lookout for insufficient skills and reach across the corporation to fill any gaps. In doing so, help educate government team members at the same time you are bringing corporate skills and experience to bear.

Now consider the following questions regarding those efforts:

What might go wrong?
What might happen if it does go wrong?
How do we prevent it from going wrong?
How will we know if it does go wrong?
What will we do when it goes wrong?

Inputs

The following are typical inputs to risk management:

Project Risk Management Plan: The Risk Management Plan is developed under the Technical Planning Process and defines how risk will be identified, mitigated, monitored, and controlled within the project.
Technical Risk Issues: These will be the technical issues identified as the project progresses that pose a risk to the successful accomplishment of the project mission/goals.
Technical Risk Status Measurements: These are any measures that are established that help to monitor and report the status of project technical risks.
Technical Risk Reporting Requirements: Includes requirements of how technical risks will be reported, how often, and to whom.

Additional inputs that may be useful:

Other Plans and Policies: Systems Engineering Management Plan, form of technical data products, and policy input to metrics and thresholds.
Technical Inputs: Stakeholder expectations, concept of operations, imposed constraints, tracked observables, current program baseline, performance requirements, and relevant experience data.

Identify Technical Risks

On a continuing basis, the technical team will identify technical risks including their source, analyze the potential consequence and likelihood of the risks occurring, and prepare clear risk statements for entry into the program/project risk management system. Coordination with the relevant stakeholders for the identified risks is included.

Facilitated group discussions may be an effective way to gather this information. Other risk identification techniques include:

Questionnaires
Industry benchmarking
Scenario analysis
Results of event tracking and historic trend analysis

Risk Assessment

Many organizations are comfortable using a simple low-medium-high or numeric value (1-3) to describe probability or impact. This should be an organization-wide standard if other risk management activities are underway.

The assessment should include quantitative factors such as dollars, percentages, time, and number of transactions. It is also common to include qualitative factors such as loss of customers and market share, damaged reputation, or loss of stakeholder confidence.

Prepare for Technical Risk Mitigation

This includes selecting the risks that will be mitigated and more closely monitored, identifying the risk level or threshold that will trigger a risk mitigation action plan, and identifying for each risk which stakeholders will need to be informed that a mitigation/contingency action is determined as well as which organizations will need to become involved to perform the mitigation/contingency action.

Monitor the Status of Each Technical Risk Periodically

Risk status will need to be monitored periodically at a frequency identified in the risk plan. Risks that are approaching the trigger thresholds will be monitored on a more frequent basis. Reports of the status are made to the appropriate program/project management or board for communication and for decisions whether to trigger a mitigation action early. Risk status will also be reported at most life cycle reviews.

Implement Technical Risk Mitigation and Contingency Action Plans as Triggered

When the applicable thresholds are triggered, the technical risk mitigation and contingency action plans are implemented. This includes monitoring the results of the action plan implementation and modifying them as necessary, continuing the mitigation until the residual risk and/or consequence impacts are acceptable, and communicating the actions and results to the identified stakeholders. Action plan reports are prepared and results reported at appropriate boards and at life cycle reviews.

Outputs

Following are key risk outputs from activities:

Technical Risk Mitigation and/or Contingency Actions: Actions taken to mitigate identified risks or contingency actions taken in case risks are realized.
Technical Risk Reports: Reports of the technical risk policies, status, remaining residual risks, actions taken, etc. Output at the agreed-to frequency and recipients.
Work Products: Includes the procedures for conducting technical risk management; rationale for decisions made; selected decision alternatives; assumptions made in prioritizing, handling, and reporting technical risks; and lessons learned.

RISK MANAGEMENT TOOLS

Risk analysis and management tools serve multiple purposes and come in many shapes and sizes. Some risk analysis and management tools include those used for:

Strategic and Capability Risk Analysis: Focuses on identifying, analyzing, and prioritizing risks to achieve strategic goals, objectives, and capabilities.
Threat Analysis: Focuses on identifying, analyzing, and prioritizing threats to minimize their impact on national security.
Investment and Portfolio Risk Analysis: Focuses on identifying, analyzing, and prioritizing investments and possible alternatives based on risk.
Program Risk Management: Focuses on identifying, analyzing, prioritizing, and managing risks to eliminate or minimize their impact on a program’s objectives and probability of success.
Cost Risk Analysis: Focuses on quantifying how technological and economic risks may affect a system’s cost. Applies probability methods to model, measure, and manage risk in the cost of engineering advanced systems.

Selecting the Right Tool

It is important that the organization define the risk analysis and management process before selecting a tool. Ultimately, the tool must support the process. When selecting a risk analysis and management tool, consider these criteria:

Aligned to risk analysis objectives: Does the tool support the analysis that the organization is trying to accomplish? Is the organization attempting to implement an ongoing risk management process or conduct a one-time risk analysis?
Supports decision making: Does the tool provide the necessary information to support decision making?
Accessibility: Is the tool accessible to all users and key stakeholders? Can the tool be located/hosted where all necessary personnel can access it?
Availability of data: Is data available for the tool’s analysis?
Level of detail: Is the tool detailed enough to support decision making?
Integration with other program management / systems engineering processes: Does the tool support integration with other program management / systems engineering processes?

References and Resources also include

https://www.nasa.gov/seh/6-4-technical-risk-management

https://www.mitre.org/publications/systems-engineering-guide/acquisition-systems-engineering/risk-management/risk-management-tools