A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques. The database management system (DBMS) is the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS software additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a “database system”. Often the term “database” is also used to loosely refer to any of the DBMS, the database system or an application associated with the database.
U.S. intelligence agencies are upgrading their antiquated IT infrastructure and bulging databases to deploy “all-source analysis” platforms. The CIA and other spy agencies have moved aggressively in recent years to migrate troves of data to the cloud where an array of tools can be used by intelligence analysts to connect the dots. Those modernization efforts are expanding to other intelligence agencies as they seek to get a handle on sensor and other unstructured data. The DIA data effort is the latest in a steady stream of U.S. military efforts aimed at upgrading databases bulging with intelligence and logistics data that remain difficult to access and quickly analyze. The Air Force, for example, is embracing commercial machine learning platforms to streamline its flight certification process as well as for predictive aircraft maintenance.
The military currently uses several integrated database-management systems, which provide overarching support and increased military effectiveness. The Defense Finance and Accounting Services system provides standardized accounting and financial management for the Department of Defense. Marine OnLine is a personnel records database-management system for all military members in the Marine Corps Total Force System. Global Combat Support System-Marine Corps is the Marine Corps’ logistical equipment database-management system. Each of these integrated systems can serve as a model for implementing proper training-record management.
Warfighters at all echelons require tested, secure, seamless access to data across networks, supporting infrastructure, and weapon systems out to the tactical edge. The advanced capabilities provided by DoD’s Digital Modernization program depend upon enterprise data management policies, standards, and practices. Sensors and platforms across all domains must be designed, procured, and exercised with open data standards as a key requirement. Survival on the modern battlefield will depend upon leveraging and making connections among data from diverse sources, using analytic tools for superior situational awareness, and coordinating information for disaggregated-precision effects.
Improving data management will enhance the Department’s ability to fight and win wars in an era of great power competition, and it will enable operators and military decision-makers to harness data to capitalize on strategic and tactical opportunities that are currently unavailable. Adversaries are also racing to amass data superiority, and whichever side can better leverage data will gain military advantage. Our ability to fight and win wars requires that we become world leaders in operationalizing and protecting our data resources at speed and scale, says David L. Norquist Deputy Secretary of Defense.
Database Management
Computer scientists classify database-management systems according to the database models that they support. Relational databases became dominant in the 1980s. These model data as rows and columns in a series of tables, and the vast majority use SQL for writing and querying data. In the 2000s, non-relational databases became popular, referred to as NoSQL because they use different query languages.
The DBMS provides various functions that allow entry, storage and retrieval of large quantities of information and provides ways to manage how that information is organized. Physically, database servers are dedicated computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with generous memory and RAID disk arrays used for stable storage. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large volume transaction processing environments. DBMSs are found at the heart of most database applications. DBMSs may be built around a custom multitasking kernel with built-in networking support, but modern DBMSs typically rely on a standard operating system to provide these functions.
Database-management systems allow users to store and manage data in an easily accessible structured manner compared to simple file systems. Traditional file systems lack key advantageous characteristics provided by database-management systems. Database-management systems reduce data redundancy, as duplicate copies are no longer stored in multiple locations. Data security increases due to multiple levels of user access and a database-management system is capable of providing automated data backups instead of manual backups conducted by the end-user. Using a database-management system allows input constraints and increases data normalization while providing user concurrency. Data validation occurs before any changes are committed to the database and enforces atomicity within the system. Database-management systems support database-transaction concepts of atomicity, consistency, isolation, and durability.
Ensuring that database systems perform effectively is a core requirement of modern IT management. At a high level, database performance can be defined as the rate at which a database management system (DBMS) supplies information to users. Database performance can be more accurately defined as the optimization of resource usage to increase throughput and minimize contention, enabling the largest possible workload to be processed. There are several types of tools that can help database administrators and other IT professionals monitor, manage and improve the performance of databases and the applications that access them.
Database performance software can monitor a database system to find problems as they occur, analyze data from the system to determine a corrective action and implementing a fix to alleviate the problem.
Data as a service (DaaS), which provides access to all forms of data across the network through a standardized service layer, is seen as the most promising development by some industry leaders. “To effectively leverage data for competitive advantage, we believe a fundamental technology change is needed to the solutions that enterprises use to organize and access their data,” said Von Wright, chief strategy and marketing officer for K2View. A DaaS platform facilitates this, “rapidly integrating data across disparate sources and delivering it in real time to any end user application—a serious challenge for traditional data solutions.”
Before DaaS, most data management platforms stored information using a row and column approach that creates complex, inflexible, and time-intensive transactions every time a user needs to access data, Wright added. Instead, DaaS supports models “built on specific business needs rather than a predefined technology or structure.”
Database security
Data security has become a necessity for every individual who is connected to internet and uses the internet for any purpose. It is a requirement that is a must in every aspect of the operation performed on the internet. Operations like online money transactions, transfer of sensitive information, web services, and numerous other operations need security of data. Along with these operations on the internet, data security is also essential and important in databases.
Data breaches are a common occurrence and can end up compromising information belonging to thousands or millions of us in single cases of a successful cyberattack. An open database exposing records containing the sensitive data of hotel customers as well as US military personnel and officials was disclosed by researchers in 2019. “The greatest risk posed by this leak is to the US government and military,” the team says. “Significant amounts of sensitive employee and military personnel data could now be in the public domain. This gives invaluable insight into the operations and activities of the US government and military personnel. The national security implications for the US government and military are wide-ranging and serious.”
Databases are the storage areas where large amount information is stored. The nature of information stored varies and depends on different organizations and companies. Nevertheless, every type of information needs some security to preserve data. The level of security depends on the nature of information. For example, the military databases require top and high level security so that the information is not accessed by an outsider but the concerned authority because the leakage of critical information in this case could be dangerous and even life threatening. Whereas the level of security needed for the database of a public server may not be as intensive as the military database.
Database security concerns the use of a broad range of information security controls to protect databases (potentially including the data, the database applications or stored functions, the database systems, the database servers and the associated network links) against compromises of their confidentiality, integrity and availability. It involves various types or categories of controls, such as technical, procedural/administrative and physical.
Databases have been largely secured against hackers through network security measures such as firewalls, and network-based intrusion detection systems. While network security controls remain valuable in this regard, securing the database systems themselves, and the programs/functions and data within them, has arguably become more critical as networks are increasingly opened to wider access, in particular access from the Internet. Furthermore, system, program, function and data access controls, along with the associated user identification, authentication and rights management functions, have always been important to limit and in some cases log the activities of authorized users and administrators. In other words, these are complementary approaches to database security, working from both the outside-in and the inside-out as it were.
Another technique that can be used to secure database is the use of access control. This is the where the access to the system is only given after verifying the credentials of the user and only after such verification is done, the access is given. User authentication and identification is normally required before the user can access the database. Authentification methods are passwords, biometric readers or signature analysis devices. These are required for better management of users. The second requirements involves authorization and access controls. These are the rules that govern what access to what information. These policies govern how information is disclosed and then modified. When you look at the access controls, these are the polices that govern the authorizations. There has to be integrity and consistency in the database operations. There has to be a correct set of rules in operation which protects the database from malicious destructions. Auditing is another requirement in database. This demands that a record of actions pertaining to operations. This is necessary in order to review and exams the efficiency of the controls system and recommend for better actions .
System Designed To Improve Database Performance For Health Care, IoT
One of the big challenges for using databases – whether for health care, Internet of Things or other data-intensive applications – is that higher speeds come at a cost of higher operating costs, leading to over-provisioning of data centers for high data availability and database performance. With higher data volumes, databases may queue workloads, such as reads and writes, and not be able to yield stable and predictable performance, which may be a deal-breaker for critical autonomous systems in smart cities or in the military.
A team of computer scientists from Purdue University has created a system, called SOPHIA, designed to help users reconfigure databases for optimal performance with time-varying workloads and for diverse applications ranging from metagenomics to high-performance computing (HPC) to IoT, where high-throughput, resilient databases are critical. The Purdue team presented the SOPHIA technology at the 2019 USENIX Annual Technical Conference.
“You have to look before you leap when it comes to databases,” said Somali Chaterji, a Purdue assistant professor of agricultural and biological engineering, who directs the Innovatory for Cells and Neural Machines [ICAN] and led the paper. “You don’t want to be a systems administrator who constantly changes the database’s configuration parameters, naïvely, with a parameter space of more than 50 performance-sensitive and often interdependent parameters, because there is a performance cost to the reconfiguration step. That is where SOPHIA’s cost-benefit analyzer comes into play, as it performs reconfiguration of noSQL databases only when the benefit outweighs the cost of the reconfiguration.”
The effect of reconfiguration on the performance of the system. SOPHIA uses the workload duration information to estimate the cost and benefit of each reconfiguration step and generates plans that are globally beneficial. Purdue’s SOPHIA system has three components: a workload predictor, a cost-benefit analyzer, and a decentralized reconfiguration protocol that is aware of the data availability requirements of the organization.
“Our three components work together to understand the workload for a database and then performs a cost-benefit analysis to achieve optimized performance in the face of dynamic workloads that are changing frequently,” said Saurabh Bagchi, a Purdue professor of electrical and computer engineering and computer science (by courtesy). “The final component then takes all of that information to determine the best times to reconfigure the database parameters to achieve maximum success.” The Purdue team benchmarked the technology using Cassandra and Redis, two well-known noSQL databases, a major class of databases that is widely used to support application areas such as social networks and streaming audio-video content.
“Redis is a special class of noSQL databases in that it is an in-memory key-value data structure store, albeit with hard disk persistence for durability,” Chaterji said. “So, with Redis, SOPHIA can serve as a way to bring back the deprecated virtual memory feature of Redis, which will allow for data volumes bigger than the machine’s RAM.” The lead developer on the project is Ashraf Mahgoub, a Ph.D. student in computer science. This summer he will go back for an internship with Microsoft Research, and when he returns this fall, he will continue to work on more optimization techniques for cloud-hosted databases.
The Purdue team’s testing showed that SOPHIA achieved significant benefit over both default and static-optimized database configurations. This benefit stays even when there is significant uncertainty in predicting the exact job characteristics. The work also showed that Cassandra could be used in preference to the recent popular drop-in ScyllaDB, an auto-tuning database, with higher throughput across the entire range of workload types, as long as a dynamic tuner, such as SOPHIA, is overlaid on top of Cassandra. SOPHIA was tested with MG-RAST, a metagenomics platform for microbiome data; high-performance computing workloads; and IoT workloads for digital agriculture and self-driving cars.
DoD Intel Agency Upgrades Databases
The CIA and other spy agencies have moved aggressively in recent years to migrate troves of data to the cloud where an array of tools can be used by intelligence analysts to connect the dots. Those modernization efforts are expanding to other intelligence agencies as they seek to get a handle on sensor and other unstructured data. Among them is the U.S. Defense Intelligence Agency (DIA), which recently awarded a $690 million “task order” contract to Northrop Grumman Corp. to deliver a big data platform. (Task orders, sometimes referred to as “mini-contracts,” differ from standard procurement deals in that they specify precisely how funds are spent and provide a delivery schedule that conforms to a master contract.)
The company (NYSE: NOC) said this week it will supply the data platform under DIA’s TALOS program, which stands for Transforming All-Source Analysis with Location-Based Object Services. The TALOS system would incorporate a next-generation database dubbed Machine-Assisted Rapid-Repository System, or MARS. The system is billed as “transforming current databases housing foundational military intelligence into multi-dimensional, flexible and rigorous data environments….” “With data proliferating at the speed of light, DIA must build a system capable of ingesting and managing large volumes of it, and making it available to both humans and machines,” the agency said.
Under the TALOS task order, Northrop Grumman said it would serve as “enterprise module integrator” for MARS. That task includes implementing AI and machine learning tools that will be used to ingest and manage large volumes of intelligence data. The system would allow intelligence analysts to leverage predictive analytics to aid military commanders in the field. “For analysts and planners, it means being able to leverage a virtual model of the real world that’s so thorough it cuts across all domains, significantly increasing the warfighter’s ability to mitigate risks and defeat adversaries,” DIA said.
The TALOS task order was awarded competitively in August, the company said. The DIA data effort is the latest in a steady stream of U.S. military efforts aimed at upgrading databases bulging with intelligence and logistics data that remain difficult to access and quickly analyze. The Air Force, for example, is embracing commercial machine learning platforms to streamline its flight certification process as well as for predictive aircraft maintenance.
The Defense Information Systems Agency has been offering “milDrive,” a cloud-based storage
The cloud service already has about 18,000 users across 20 organizations, the program manager said. “There’s quite a large user base in the queue right now that’s interested, and we are currently piloting with and developing a migration strategy for them,” said Carissa Landymore. “The need is definitely there.” The milDrive service is available for users on DODIN, the unclassified Defense Department information network.
Users often store files on network drives so they can be shared with others within their organizations. The milDrive service gives users that ability, and it also allows them to access files from any common access card-enabled computer on the network and from their government cell phones and tablets. Typically, network shared drives only allow users to access files when they are on their home network. Unlike other cloud-service solutions in use by some DOD agencies, milDrive allows users to store files that contain personally identifiable information, personal health information and “for official use only” information because the storage for milDrive is maintained by DISA, rather than by a commercial provider, Landymore said. “From a security perspective, all the data is always encrypted, in transit and at rest,” she said. “So, it’s always providing that extra blanket of security.”
Also, unlike with typical network shares, milDrive users can grant access to their files to any milDrive user in the Defense Department, Landymore said. Users can even share files with other DOD personnel who don’t have milDrive access through a web-based interface. And unlike some web-based cloud service solutions, milDrive is thoroughly integrated into the desktop environment, which means users can create, read and manipulate files stored in the cloud using the software already installed on their desktop computers. “It’s completely integrated and transparent on your desktop,” she said. “It’s the same traditional look and feel as Windows File Explorer and used like any other location to open or save files.
Landymore said DISA offers 1 terabyte or 20 gigabyte licenses for individual users. Both licenses cost less than $10 a month. Organizations can also order “team drives” starting at 1 TB. As with traditional network shares; milDrive “Team Folders” allow organizations the ability to collaborate traditionally with the added benefits of online and offline access, mobility and portability of group data they do not have today.
Future of the Database Market Is the Cloud
By 2022, 75% of all databases will be deployed or migrated to a cloud platform, with only 5% ever considered for repatriation to on-premises, according to Gartner, Inc. This trend will largely be due to databases used for analytics, and the SaaS model. “According to inquiries with Gartner clients, organizations are developing and deploying new applications in the cloud and moving existing assets at an increasing rate, and we believe this will continue to increase,” said Donald Feinberg, distinguished research vice president at Gartner. “We also believe this begins with systems for data management solutions for analytics (DMSA) use cases — such as data warehousing, data lakes and other use cases where data is used for analytics, artificial intelligence (AI) and machine learning (ML). Increasingly, operational systems are also moving to the cloud, especially with conversion to the SaaS application model.”
Gartner research shows that 2018 worldwide database management system (DBMS) revenue grew 18.4% to $46 billion. Cloud DBMS revenue accounts for 68% of that 18.4% growth — and Microsoft and Amazon Web Services (AWS) account for 75.5% of the total market growth. This trend reinforces that cloud service provider (CSP) infrastructures and the services that run on them are becoming the new data management platform.
Ecosystems are forming around CSPs that both integrate services within a single CSP and provide early steps toward intercloud data management. This is in distinct contrast to the on-premises approach, where individual products often serve multiple roles but rarely offer their own built-in capabilities to support integration with adjacent products within the on-premises deployment environment. While there is some growth in on-premises systems, this growth is rarely from new on-premises deployments; it is generally due to price increases and forced upgrades undertaken to avoid risk. “Ultimately what this shows is that the prominence of the CSP infrastructure, its native offerings, and the third-party offerings that run on them is assured,” said Mr. Feinberg. “A recent Gartner cloud adoption survey showed that of those on the public cloud, 81% were using more than one CSP. The cloud ecosystem is expanding beyond the scope of a single CSP — to multiple CSPs — for most cloud consumers.”
References and Resources also include:
https://www.datanami.com/2020/10/01/dod-intel-agency-upgrades-databases/