Gather & Prepare Data for Data Analytics in 7 Steps
Step 1: Identify the Questions and the Assertions You Want to Answer
The Auditor must first understand what the objective is for the audit and what assurances are being tested, including verification that:
- All assets, liabilities, and equity interests have been recorded and disclosed correctly.
- Confirm the existence of assets, liabilities, and equity.
- Valuation of Assets, liabilities, and equity are reported at the correct amount.
- Transactions and events are recorded in the correct accounting period.
Step 2: Confirm the Data Requirements to Meet the Audit Objectives
Once the objectives and assurances have been identified, the data requirements to meet these objectives can be determined to ensure that the Auditor can draw reasonable conclusions.
Determine what data exists to meet the objectives set and how much data is needed to provide answers for objectives and assurances the Auditor is trying to gain.
In this section, rather than referring to all of the specific data requirements needed for the testing objectives in this guide, all of the relevant data requirements for each of the business processes are detailed within each of their specific sections.
Step 3: Determine the Location or Source of Data and Understand Business Processes and Procedures
This step is critical to understand where the information is stored and how information flows through the business and accompanying systems.
It is also important to identify all of the individuals within the business process who will need to be involved in the data collection process. This can include, but not be limited to:
- Key stakeholders who may be needed to sign off on the relevant data collection.
- Key contact who the Auditor will work with to identify the information that the audit requires.
- An individual who will perform the data extraction. This could be a member of staff who works within the business process itself or depending on the size of the organization may be a member of the IT department/database administrator.
Step 4: Refine Data Requirements
Some of the testing objectives that would have been defined in the early stages of the audit, as well as from discussions with the client themselves, may also be based on previous experience with other similar clients or from information obtained from professional bodies or solution providers. Due to this, during the third stage of this process, it may become apparent that the client does not record or have access to certain information that is required. Depending on any regulatory requirements that require the client to provide this information or further opportunity to access this information (Outsource to a 3rd party to produce a report with the required information), you may have to reduce some of your data requirements and testing objectives.
Step 5: Determine the Format of the Data
It is recommended that you provide guidance to determine the type of file formats that the client should select when extracting data from their systems. It is worth noting that in most cases many of the common file formats have their own issues/limitations. Incorrect selection of a file format can potentially cause reporting on inaccurate data. It can also cause delays in the Data Analytics process due to the need for manual modification of the data to put it into the correct format.
This will need to be determined on a client by client basis. Some organizations use 3rd party tools to extract the data from their systems. This means that although the Auditor may have come across the accounting or ERP (Enterprise Resource Planning) software with another client, the formats available may vary.
In order to reduce any issues with requesting the wrong format, it is recommended that the Auditor works with the individual tasked with extracting the data. Based on the advice in this guide and in due course personal experience, identify the top three file formats and extract a small subset of the data (1 weeks' worth) and import them into IDEA. This should hopefully identify any importation issues and identify the relevant file format to be used when extracting the full periods worth of data.
CaseWare IDEA has the ability to import data from a variety of file formats including AS400, dBase, Microsoft Access, Microsoft Excel, Print reports and Adobe PDF, SAP/AIS, Text and XML.
Three of the most common types of file formats available are Microsoft Excel, delimited text files, and Print reports.
Microsoft Excel
Microsoft Excel is one of the most common file types used and can be imported into IDEA.
There are some specific rules about the structure that the data must be stored within the Excel file. For example, there must be:
-
- No blank rows or columns
- No title rows
- No sub-totals or grand-totalsThis means that the data must be a straight extract from the system without any grouping or formatting applied. The Auditor must be aware that reports in a print to view/paper format can be output to an Excel file format and will most likely require manual modification to remove groupings, blank lines, subtotals and grand totals which in itself is open to potential mistakes. One reason that the Microsoft Excel format should not potentially be the first considered format, is due to the fact that depending on the quality of the reporting software all values will be written into the Excel file in the 'General' format. The potential impact being that reference numbers will be treated as numbers and leading zeros will be removed. Specific examples of this would be telephone numbers and bank sort codes and account numbers. Should the Auditor want to compare these to other datasets, this could lead to valid matches not being identified.
Plain Text Files
Plain text files are commonly used to transfer large files between companies and systems. A plain text file is a file storing tabular data in plain text.
There are two main types of plain text files that IDEA can import:
Fixed Length
Common file extensions include .txt, and .fxd.
They are known as fixed length due to the fact that each record in the file is exactly the same amount of characters and that the fields contained within the records always start in the same character position.
Delimited
One of the most common types of delimited files are CSV (Comma Separated Value) files. Other common file extensions include .asc, and .del.
Unlike fixed-length files that use spaces to ensure that fields start in the same position, delimited files use special characters to define where fields start and end and is the most efficient type of text file, due to the fact that the records are only as long as the values contained within them.
It is important to note that delimited text files can sometimes be exported from core systems configured to use different delimiters. Examples include tab, colon, semicolon, and space. The choice of delimiter can, in some cases, be critical as common delimiters can also exist within the actual data itself, for example, commas can also be used in names and addresses.
If the option to select your own delimiter is available, we recommend selecting pipe (|), tilde (~), or not sign (¬). In essence, any selected delimiter must not exist within any of the records other than being used as a delimiter.
Another option sometimes available is the inclusion of text encapsulators, the most common being quote marks (""). By wrapping encapsulators around text values any field delimiters that occur between the encapsulators are ignored (e.g., "Smith, John", "12, High Street",).
Whether it be a delimited or fixed length file that is chosen, it is advised to obtain technical information about the file's record layout. Key information which will assist when importing these files include:
-
- Field Name (If not already included within the files themselves)
- Field Type (Numeric, date, time, character)
- Delimiter (Delimited file only)
- Encapsulator (Delimited file only)
- Field length (Fixed length files only)
- Number of decimals
- Thousands separator and decimal point characters.
Print Reports
Print reports are probably one of the easiest file types to obtain as nearly all systems have the ability to produce printed reports and are simply an electronic version of a printed document and often contains extra formatting and information. The additional formatting is used to make the printed document as clear as possible for the reader.
IDEA has the capability to import print reports using a built-in tool, Report Reader, and can extract the required data for import from the file. Importing a print report again allows for a greater degree of control, as it allows the user to select and import the fields required for analysis, add in field descriptions and amend field types prior to import.
Bear in mind that although this is a powerful functionality that allows the Auditor to convert print to view reports into structured data for easy analysis, this file type should only be considered if less time-consuming options are not available.
For more guidance on the process of importing data into IDEA, refer to the IDEA and Report Reader Tutorials located in the IDEA documentation folder. More information and guidance can also be found using the IDEA Help function (F1). Also note that if you have access to CaseWare Passport support website further information, tutorials, and videos are also available.


Conclusion
It is important to note that regardless of the type of audit being completed an accurate and efficient data collection process is essential to maintaining the integrity and timeline of the audit or analysis. Understanding the data, the formats available, the data sources and any data restrictions in place can reduce the likelihood of errors occurring. A data collection process is essential to ensure that the data gathered is both well-defined and accurate. This provides confidence that the assurances tested are valid and true. For more information check out our Analytics product offeringMORE RESOURCES ARE AVAILABLE HERE
- Recorded webinar Using Analytics in Audit – With Examples
- Whitepaper - Is Better Planning the Key to an Efficient Audit
- Product page information, with ebook publication "Analytics for External Auditors",
]]>