Validating XML Schema of OVAL Documents with Python
OVAL is the Open Vulnerability Assessment Language, which uses XML based documents to define vulnerabilities based on characteristics of a host system. It can also be used to gather information about the host. When an OVAL file is evaluated, it generates a report file with the results of the vulnerability evaluation or a system characteristics file containing information gathered from the host.
OVAL Definitions, OVAL System Characteristics and OVAL Results
These capabilities are achieved through three distinct document types: OVAL Definitions, OVAL System Characteristics and OVAL Results. The specific format for each type is defined by a Schema, which is a document that contains rules that the structure of the OVAL document must adhere to. These rules include instructions such as the order that elements must appear, how often an element can appear, if the element is required or not, which attributes an element has, and what type of data can be contained within an element.
Validation of an XML file is the process of evaluating whether it conforms to the format described by the schema. If it conforms to the schema, it is considered valid.
An OVAL interpreter is an executable which evaluates OVAL Definition files and produces OVAL System Characteristic files and OVAL Results. Since System Characteristics and Results are both generated by the interpreter when an OVAL Definition file is processed, it is the interpreter’s responsibility to ensure that the files it generates adhere to the specified schemas.
The OVAL Definition file, which details the information to be queried from a host and how that data should be evaluated, can be written manually or generated automatically. This means that Definition files may be generated incorrectly due to errors or typos that fail to conform to the schema. Generally, invalid Definition files should be rejected by the interpreter, but in some cases, it could cause the interpreter to fail or to generate incorrect data. Therefore, it is important to ensure that Definition files conform to the schema before passing them to an OVAL interpreter.
An option for validation is to write a script to evaluate generated Definition files. The Python library lxml has functions for processing, modifying and generating XML documents as well as validating XML documents against a schema. The following code can be used to perform XML validation:
import lxml.etree
schema_validator = lxml.etree.XMLSchema(file=<schema_file>)
is_valid = schema_validator.validate(<xml_file>)
With this code validating XML against a single schema is fairly straightforward. However in the case of OVAL documents, multiple schemas are used to define rules for its various components. At minimum, an OVAL Definition file uses the oval-common-schema and the oval-definitions-schema. These schemas define the general structure of OVAL and the structure of the Definition file respectively. In addition to these, at least one other schema is required to define the specific types of data that can be queried from a host such as package versions, file information and configuration settings. For these specific schemas, there is generally one per operating system (eg, Windows, Linux, macOS). This means we need at minimum three different schemas to validate an OVAL Definition. This is problematic given lxml can only accept a single schema file.
Validating an OVAL file in Python
This limitation is mitigated by the ability to import schema files into another schema file. Once imported, the additional schema files will also be available for validation when the importing file contains the additional schemas. The OVAL schema files make use of this functionality, and the OS specific schema files import the required oval-common-schema and the oval-definitions-schema. Therefore, for most cases, to validate an OVAL file in Python with lxml, only the schema file for the OS the OVAL XML is written for needs to be specified. For example, a file written for querying a Windows host would need to pass the windows-definitions-schema to the lxml processor. Here is a snippet from the start of the windows-definitions-schema file showing the additional schemas being imported:
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:oval="http://oval.mitre.org/XMLSchema/oval-common-5" xmlns:oval-def="http://oval.mitre.org/XMLSchema/oval-definitions-5" xmlns:win-def="http://oval.mitre.org/XMLSchema/oval-definitions-5#windows" xmlns:sch="http://purl.oclc.org/dsdl/schematron" targetNamespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#windows" elementFormDefault="qualified" version="5.10.1">
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-common-5" schemaLocation="oval-common-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5" schemaLocation="oval-definitions-schema.xsd"/>
There are, however, still scenarios where a Definition file can use elements from more than one additional schema. This will commonly occur when using elements from the independent-definitions-schema, which contains functionality that can be used across multiple operating systems such as hashing files, checking environment variables and reading file contents. A Definition file written for Windows that uses both the Windows schema and Independent schema would not be possible to validate with lxml by passing in any single one of the default schema files. Passing in only one of the required schemas would cause the validation to fail on elements found in the schema that has not been provided to the Python script.
To resolve this problem, we can use the same import functionality that was shown in the example above, only this time using a specially created test schema. The test schema only needs to import the other schema files required for successful validation. It does not itself contain any of the document structure rules found in the other schemas. Any number of schemas can be imported into this file, so it’s not necessary to create a separate test schema file for every variation of Definition files even if they are written for completely different operation systems. Here’s an example of a single file that imports all the supported OVAL schema files:
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:oval="http://oval.mitre.org/XMLSchema/oval-common-5" xmlns:oval-def="http://oval.mitre.org/XMLSchema/oval-definitions-5" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:sch="http://purl.oclc.org/dsdl/schematron" targetNamespace="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" version="5.10.1">
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#aix" schemaLocation="aix-definitions-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#apache" schemaLocation="apache-definitions-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#catos" schemaLocation="catos-definitions-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#esx" schemaLocation="esx-definitions-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#freebsd" schemaLocation="freebsd-definitions-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#hpux" schemaLocation="hpux-definitions-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#independent" schemaLocation="independent-definitions-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#ios" schemaLocation="ios-definitions-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#linux" schemaLocation="linux-definitions-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#macos" schemaLocation="macos-definitions-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#pixos" schemaLocation="pixos-definitions-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#sharepoint" schemaLocation="sharepoint-definitions-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#solaris" schemaLocation="solaris-definitions-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#unix" schemaLocation="unix-definitions-schema.xsd"/>
<xsd:import namespace="http://oval.mitre.org/XMLSchema/oval-definitions-5#windows" schemaLocation="windows-definitions-schema.xsd"/>
</xsd:schema>
Using this example as the validator schema in the Python script above allows accurate validation of any OVAL Definition file regardless of the combination of the currently supported schemas it employs.