77
Transaction Validation and Analysis January 14, 2020 _________________________________________ A Major Qualifying Project Report Submitted to the Faculty of Worcester Polytechnic Institute In partial fulfilment of the requirements for the Degree of Bachelor of Science. Project ID: 14985 Project Team Manasi Danke CS Ethan Merrill MGE Joseph Yuen CS Project Advisors: Michael Ginzberg, Business Department Robert Sarnie, Business Department Wilson Wong, Computer Science Department Sponsored by: Hedge Fund Company

Transaction Validation and Analysis

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Transaction Validation and Analysis

Transaction

Validation and Analysis January 14, 2020 _________________________________________

A Major Qualifying Project Report Submitted to the Faculty of Worcester Polytechnic Institute In partial

fulfilment of the requirements for the Degree of Bachelor of Science.

Project ID:

14985

Project Team

Manasi Danke CS

Ethan Merrill MGE

Joseph Yuen CS

Project Advisors:

Michael Ginzberg, Business Department

Robert Sarnie, Business Department

Wilson Wong, Computer Science Department

Sponsored by:

Hedge Fund Company

Page 2: Transaction Validation and Analysis

i

Acknowledgements

First, we would like to thank our sponsor for the amazing opportunity to learn about financial

technology and to assimilate into the company culture.

We would also like to thank our WPI advisors Professor Michael Ginzberg, Robert Sarnie, and Wilson

Wong for their availability and support. Our regular meetings with them encouraged us and taught us

how to be agile in the financial industry.

Lastly, we would like to thank the open source community for their extensive documentation and

tutorials. We were able to learn a wide array of new technologies due to these valuable resources.

Thank you,

Manasi Danke

Ethan Merrill

Joseph Yuen

Page 3: Transaction Validation and Analysis

ii

Table of Contents

Table of Contents .......................................................................................................................................... ii

Table of Figures ........................................................................................................................................... vii

Abstract ...................................................................................................................................................... viii

Executive Summary ...................................................................................................................................... ix

1. Introduction .............................................................................................................................................. 1

1.1 Problem ............................................................................................................................................... 1

Thread 1 (Winners and Losers Report Update) .................................................................................... 1

Thread 2 (Azure Validation Dashboard) ................................................................................................ 1

1.2 Goals ................................................................................................................................................... 1

1.3 Deliverables ......................................................................................................................................... 2

Thread 1 ................................................................................................................................................ 2

Thread 2 ................................................................................................................................................ 2

2. Background ............................................................................................................................................... 3

2.1 Finance Industry .................................................................................................................................. 3

2.1.1 Financial Reporting ...................................................................................................................... 3

2.1.2 Accounting Validation .................................................................................................................. 3

2.1.3 Current System ............................................................................................................................ 3

2.2.4 Previous Work .............................................................................................................................. 4

2.3 Software Development Environment ................................................................................................. 5

2.3.1 Python .......................................................................................................................................... 5

2.3.2 Pandas .......................................................................................................................................... 5

2.3.3 Anaconda ..................................................................................................................................... 6

2.3.4 Apache Spark and Databricks....................................................................................................... 6

2.3.5 Power Business Intelligence (Power BI) ....................................................................................... 6

2.3.6 Microsoft Azure Data Lake ........................................................................................................... 7

2.3.7 Project Management Tools .......................................................................................................... 7

2.3.8 Source Control ............................................................................................................................. 8

3. Methodology ............................................................................................................................................. 9

3.1 Project Management .......................................................................................................................... 9

3.2 Choosing a Methodology .................................................................................................................... 9

Page 4: Transaction Validation and Analysis

iii

3.2.1 Scrum ......................................................................................................................................... 10

3.2.2 Risk Management ...................................................................................................................... 13

3.2.3 Requirements Gathering ............................................................................................................ 13

4. Requirements Gathering ......................................................................................................................... 14

4.1 Sprint Planning Meetings .................................................................................................................. 14

4.2 Sponsor Communication ................................................................................................................... 14

4.2.1 Daily Scrums ............................................................................................................................... 14

4.2.2 Product Demonstrations ............................................................................................................ 14

4.2.3 Interviews ................................................................................................................................... 14

5. Analysis ................................................................................................................................................... 15

5.1 Epics & Themes ................................................................................................................................. 15

Epic 1: Improve Winners Losers Report Generator ............................................................................ 15

Epic 2: Azure Validation Dashboard (Validator) .................................................................................. 15

5.2 User Stories ....................................................................................................................................... 16

6. Design ...................................................................................................................................................... 17

6.1 System Architecture .......................................................................................................................... 17

6.2 Data Flow Diagram (DFD) .................................................................................................................. 17

6.3 Entity Relationship Diagram (ERD) .................................................................................................... 19

6.4 Use Case Diagrams ............................................................................................................................ 23

6.5 User Interface Structure Diagram ..................................................................................................... 24

6.6 User Experience ................................................................................................................................ 25

6.6.1 Home .......................................................................................................................................... 25

6.6.2 Commentary .............................................................................................................................. 26

6.6.3 Alerts .......................................................................................................................................... 27

6.6.4 History ........................................................................................................................................ 28

6.7 Design Patterns ................................................................................................................................. 29

6.7.1 Strategy Pattern ......................................................................................................................... 29

7. Implementation ...................................................................................................................................... 30

Pre-Qualifying Project Work ................................................................................................................... 30

Sprint 1 .................................................................................................................................................... 30

User Stories Completed: ..................................................................................................................... 30

Sprint Review ...................................................................................................................................... 30

Sprint Retrospective Meeting ............................................................................................................. 30

Page 5: Transaction Validation and Analysis

iv

Sprint 2 .................................................................................................................................................... 31

User Stories Completed: ..................................................................................................................... 31

Sprint Review ...................................................................................................................................... 32

Sprint Retrospective Meeting ............................................................................................................. 32

Sprint 3 .................................................................................................................................................... 33

User Stories Completed: ..................................................................................................................... 33

Sprint Review ...................................................................................................................................... 33

Sprint Retrospective Meeting ............................................................................................................. 34

Project Risks ........................................................................................................................................ 34

Sprint 4 .................................................................................................................................................... 34

User Stories Completed: ..................................................................................................................... 34

Sprint Review ...................................................................................................................................... 35

Sprint Retrospective Meeting ............................................................................................................. 36

Project Risks ........................................................................................................................................ 36

Sprint 5 .................................................................................................................................................... 36

User Stories: ........................................................................................................................................ 36

Sprint Review ...................................................................................................................................... 37

Sprint Retrospective Meeting ............................................................................................................. 38

Project Risks ........................................................................................................................................ 38

Sprint 6 .................................................................................................................................................... 38

User Stories: ........................................................................................................................................ 38

Sprint Review ...................................................................................................................................... 40

Sprint Retrospective Meeting ............................................................................................................. 40

Project Risks ........................................................................................................................................ 40

Weekly Burndown ................................................................................................................................... 41

8. Testing ..................................................................................................................................................... 43

8.1 Quality Assurance Procedure ............................................................................................................ 43

8.2 User Feedback ................................................................................................................................... 43

9. Future Work ............................................................................................................................................ 44

9.1 Thread 1 ............................................................................................................................................ 44

9.1.1 Modularize Strategies Further ................................................................................................... 44

9.1.2 Modularize Pre-Processing Functions Further ........................................................................... 44

9.1.3 Determine User Base ................................................................................................................. 44

Page 6: Transaction Validation and Analysis

v

9.2 Thread 2 ............................................................................................................................................ 44

9.2.1 Add More Timeseries Data to Datalake ..................................................................................... 44

9.2.2 Schedule Script ........................................................................................................................... 44

9.2.3 Add More Alerts and Analysis .................................................................................................... 45

9.2.4 Add More Fields to Data Frame ................................................................................................. 45

9.2.5 Create Summary Page ................................................................................................................ 45

10. Learning Assessment............................................................................................................................. 46

10.1 Challenges ....................................................................................................................................... 46

1. Identifying Requirements................................................................................................................ 46

2. Planning VS Execution ..................................................................................................................... 46

3. Domain Knowledge ......................................................................................................................... 46

4. Optimization.................................................................................................................................... 46

10.2 Learnings ......................................................................................................................................... 47

10.2.1 Computer Science .................................................................................................................... 47

10.2.2 Project Management ............................................................................................................... 48

10.3 What we would do differently ........................................................................................................ 48

1. Determine needs of client – priorities and whether it is a want or a need .................................... 48

2. Establish capability of tools with client ........................................................................................... 49

3. Testing ............................................................................................................................................. 49

4. Team Communication ..................................................................................................................... 49

5. Technical Mentors ........................................................................................................................... 49

11. Conclusion ............................................................................................................................................. 50

Works Cited ................................................................................................................................................. 51

Appendix ..................................................................................................................................................... 53

APPENDIX A: User Stories ....................................................................................................................... 53

APPENDIX B: Project Risks Per Sprint ...................................................................................................... 60

APPENDIX C: Interview 1 with Firm Accountants ................................................................................... 62

APPENDIX E: Financial Terminology ........................................................................................................ 64

Asset Valuation ................................................................................................................................... 64

Internal Rate of Return (IRR) ............................................................................................................... 64

Multiple of Invested Capital (MOIC) ................................................................................................... 64

Gross Profit ......................................................................................................................................... 64

Remaining Market Value (remMV) ..................................................................................................... 64

Page 7: Transaction Validation and Analysis

vi

Total Cost ............................................................................................................................................ 64

Total Sales ........................................................................................................................................... 64

Total Terminal Value ........................................................................................................................... 65

Return Period ...................................................................................................................................... 65

APPENDIX F: Site Map ............................................................................................................................. 65

APPENDIX G: Site Structure Diagram ...................................................................................................... 66

Page 8: Transaction Validation and Analysis

vii

Table of Figures

Figure 2.0 Financial Structure.......................................................................................................................5

Figure 3.0 Methodology Comparison Chart................................................................................................10

Figure 3.1 Product Backlog........................................................................................................................12

Figure 3.2 Burndown Chart Guide..............................................................................................................13

Figure 3.3 Risk Management Framework ..................................................................................................14

Figure 6.0 System Architecture Diagram ..................................................................................................19

Figure 6.1 Context and Level 0 Diagram ...................................................................................................20

Figure 6.2 Data Flow Diagram Level 1 .......................................................................................................21

Figure 6.3 As Is Data Lake Entity Relationship Diagram.............................................................................23

Figure 6.4 New Data Lake Entity Relationship Diagram.............................................................................24

Figure 6.5 results_and_flows Data Frame Entity Relationship Diagram....................................................26

Figure 6.6 irr_timeseries Data Frame Entity Relationship Diagram ….......................................................27

Figure 6.7 Use Case Diagram.....…..............................................................................................................28

Figure 6.8 User Interface Structure Diagram..............................................................................................29

Figure 6.9 Power BI Home..........................................................................................................................30

Figure 6.10 Power BI Commentary.............................................................................................................30

Figure 6.11 Power BI Alerts........................................................................................................................32

Figure 6.12 Power BI History Timeseries………………………………………..………………………………………………………33

Page 9: Transaction Validation and Analysis

viii

Abstract

While working for a large investment firm, our team worked on two projects (‘threads’), both of which

enhanced the company’s reporting capabilities. We updated the firm’s system for portfolio performance

tracking and reporting and created a tool to confirm, automate, and customize commentary describing

investment performance. For our first thread, we provided documentation on how to utilize and

manipulate code to add and auto populate data for new columns on the Winners and Losers Report. For

our second thread, we created an Azure Validation dashboard to work with the firm’s new cloud data

management infrastructure and operate consistently with the other firm systems. We developed scripts

to validate transaction data, generate commentary on top contributors and detractors for gross profit

month over month, and utilize timeseries data to investigate trends and perform statistical analysis. Our

dashboard visualizes data with Microsoft Power Business Intelligence and provides users the ability to

customize their view and drill through the raw data to find what caused certain alerts and movement in

performance of strategies. Throughout the project, we used Agile Scrum to work in a team of three

members to deliver and document software solutions that provide efficiency and flexibility for the firm,

its information technology analysts, its accountants, and its clients.

Page 10: Transaction Validation and Analysis

ix

Executive Summary

The firm is an alternative investment manager that focuses on credit, private equity, real estate, and

multi-Strategy. As a company in the financial technology industry, it utilizes reporting tools, data

analytics, and financial indicators to evaluate how well its strategies are performing. It has a Winners

and Losers report along with a product that the past MQP team created to generate and customize the

document. The past MQP aimed to simplify the workflow and interface in generating these reports, but

customization components of the project were not being fully utilized as they had outdated setup and

use instructions. In order to effectively showcase the customization components, we modified the code

and recorded a tutorial on how to add and modify more columns to the winners and losers report.

During this first project (Thread 1), we worked with the firm & Co to enhance its reporting systems.

Additionally, we explored the firm’s new Azure cloud system for Thread 2, to validate data and generate

commentary on what caused changes in performance for their investment strategies. The firm

requested this functionality because current processes for validation and commentary generation were

labor intensive. The firm sought to perform advanced timeseries analyses to make more informed

investment decisions. As a result, we wrote scripts in Databricks that queried data from Azure Data Lake

and utilized SQL, Spark, and Pandas to analyze data and display visualizations in Power Business

Intelligence (Power BI) for our Azure Validation Dashboard.

In order to validate data that was uploaded to Data Lake, we flagged incorrect data and generated

specific alerts in our dashboard. The validation checks a range of potential flaws in the data, from

missing information to suspicious performance metrics. In addition, we developed commentary showing

top contributors and detractors using the change in gross profit for return periods: Inception to Date,

Year to Date, 1 Year, 3 Year, and 5 Year. Furthermore, we used statistical analysis and created a

timeseries to evaluate different trends in the data. Our dashboard helps users view these features in

different panes and enables them to drill through to see which data points accounted for alerts,

performance, and timeseries in the raw data. Overall, our dashboard supports the firm’s IT analysts,

accountants, and clients in examining their data and gives them the power to customize their view to

further investigate the reason behind movements.

Throughout the duration of the project, we utilized the Scrum Agile methodology along with a Kanban

interface in Airtable. We conducted daily standup meetings and created user stories for seven one-week

sprints. Towards the end of each sprint, we reflected on the week and used client feedback to write user

stories and continue sprint planning for the next week. This methodology enabled us to realistically plan

and make continuous improvements to our software product.

Page 11: Transaction Validation and Analysis

1

1. Introduction

1.1 Problem As technology advances, financial institutions seek to utilize the latest technology to stay ahead of their

competitors. This hedge fund management firm is no exception. It utilizes a variety of technologies such

as cloud storage, visualization programs, and data manipulation software in its daily workflow. All these

technologies are used to make informed investment decisions and communicate fund performance to

stakeholders.

The firm recently updated its data management infrastructure to a new cloud database and wanted to

enhance its reporting systems to reap the benefits of the new infrastructure. Until this upgrade,

historical pricing and performance data was not stored in a way which allowed for easy analysis across

multiple time periods.

By placing all available data in an Azure cloud database, the firm’s IT employees could programmatically

access all of the firm’s historical transaction data quickly using Python scripts. Using this, the firm’s IT

employees generated historical performance metrics such as Internal Rate of Return and Multiple Of

Invested Capital. This allowed for further analysis of the historical changes in these metrics. However,

although the data was more accessible, it was not utilized by non-IT employees. The firm’s Accountants

and Analysts—the individuals who could use this data most—were unfamiliar with how to access the

cloud database.

Thread 1 (Winners and Losers Report Update) The firm analysts required additional tools for their static investment report generator created by a

previous MQP group. Users did not know how to modify the output of the Winners and Losers Report

generator because documentation on how to use the past generator was outdated.

Thread 2 (Azure Validation Dashboard) The firm analysts required an enhanced reporting system that ensured accounting data was correct and

generated human readable reports. The company loaded a portion of raw transaction data into a new

cloud database, but it still needed to be validated and used to inform analysts. The data came directly

from the firm’s automated accounting system. The firm asked our team to verify the data using a variety

of tests. Additionally, the firm wanted to analyze historical investment performance data with its new

cloud database, as the data was not accessible nor in a format that would be easily comprehensible by a

firm Analyst.

1.2 Goals Our project goal was to improve the way the firm validates its investment performance data and how

that data is presented to various stakeholders.

The purpose of validating data is to make sure that all the information for each reporting period is

correct. These validation checks are designed to highlight abnormal activity. For example, some checks

were binary, while others used thresholds to validate the data. By communicating with those who

understood the data and performed the checks manually, we developed checks which met the needs of

the firm’s Analysts.

Page 12: Transaction Validation and Analysis

2

Additionally, we aimed to present the validation and other information in a way which was not

overwhelming to the average user and made intuitive sense. A software tool is only useful if it is

understood, so design and presentation were a key priority during development.

We aimed to improve these processes to save the firm’s employees time, improve the accuracy of their

reports, and generate new insights from their data. Each of our two threads have smaller sub goals:

Thread 1 was a mission to update documentation for an existing tool:

• Maintainable - The previous code was difficult to understand, so we made the code easy to

maintain and extend upon in future development.

For Thread 2, we aimed to improve how the firm works with and views data in its cloud database:

• Robust - The firm’s cloud database has a type of information that is critical to primary business

functions. Our software was required to be reliable due to the importance of the data.

• Cloud Independent – We had to minimize our dependency on the Azure cloud infrastructure.

The firm has a long-term goal of being cloud independent, so they would like to limit the

number of Microsoft Azure specific integrations in their software.

• Transparent Calculations – To prove that calculations are correct, we needed to display

supporting data points and the information used to derive calculations.

1.3 Deliverables

Thread 1

After assessing the current system, we concluded that the structure and capabilities of the codebase

were adequate. The documentation for the codebase, however, was not up to date. Therefore, we

created improved documentation on how to add, modify, or remove columns from the program-

generated Excel report.

Thread 2 We built a Power BI dashboard to display the firm’s investing records from its Azure Data Lake. We

programmed back-end Python scripts programmed in Databricks to perform various analyses on the

data. We also designed an intuitive Power BI dashboard for business analysts and accountants to view

these analyses.

Page 13: Transaction Validation and Analysis

3

2. Background

2.1 Finance Industry

2.1.1 Financial Reporting Financial reporting is designed to inform investors, build trust, and comply with federal regulations.

Financial statements are issued to build investors’ trust in an institution. In publicly traded investments,

these statements are required to be reported by law and to adhere to Generally Accepted Accounting

Principles. Private funds such as the firm are required to register with the Securities and Exchange

commission as investment advisors because of the Dodd-Frank Wall Street Reform and Consumer

Protection Act of 2010 (Eriksson, 2016). The accounting standards used by hedge funds and private

equity firms are called Global Investment Performance Standards. To understand fund performance and

to address inquiries of investors in funds, most funds generate reports internally and distribute them to

investors on a monthly or quarterly basis. From the perspective of investors, it is of the utmost

importance to understand how their money is being invested and how those investments have

performed historically (Securities and Exchange Commission, 2012). Financial reporting uses a variety of

calculations to value and assess performance of investments.

2.1.2 Accounting Validation The accuracy of these statements is just as important as issuing reports and statements on performance.

Any retraction or correction of financial statements has significant negative impacts on investor trust in

the institution. As a result, financial reports are thoroughly checked to ensure they are accurate before

they are published. This procedure may be performed by accounting teams through a series of checks

typically in various Excel Sheets. These checks are based on a variety of rules to determine which data

may be incorrect due to human error, a reporting failure, or other causes.

2.1.3 Current System

Financial Structure

The firm has its assets organized in a hierarchy as shown in Figure 2.0, which allows the company to

aggregate and abstract performance and other information at varying levels.

First, the firm has various business units. These units are the highest level of organization within the

company. Business units are an overarching part of the firm’s Business such as Energy or Distressed. Our

team worked with data in the Distressed and Energy Business units of the firm.

Within business units, there are many portfolios. Portfolios are a collection of various investments which

are known as strategies at the firm. In other words, portfolios are a bundle of assets for the purpose of

tracking performance and management. A portfolio can also belong to multiple business units, and

these portfolios are managed by different teams within the firm. Each portfolio has its own objectives to

justify the investments it makes and how those investments are maintained.

The level below portfolio is Strategy. Strategies are a specific investment such as a company or piece of

real estate. These Strategies have synonymous Deal Names which are more descriptive. The Deal Name

is often a company name which is used more by analysts than the Strategy code itself, which is a

combination of letters and numbers. Each Strategy also can belong to many regions. This information is

used to determine where the firm’s risk is located geographically.

Page 14: Transaction Validation and Analysis

4

Below Strategy code is Sycode. There can be many Sycodes for each Strategy, however there is often

one Sycode for one Strategy. Sycodes are the type of financial instrument used to interact with the

investment. A Sycode could be a stock purchase, derivatives contract, or something else. In addition, a

Sycode can describe a trade on a Strategy as an option purchase or any other range of financial

instrument types. These Sycodes are a combination of many other given fields such as TransactionType,

TradeDate, and Strategy. Sycode is used in this hierarchy since one Strategy can have many types of

financial instruments. These financial instruments can be bought, sold, etc. The TransactionType field is

used to describe these transactions.

Figure 2.0 Financial Structure

2.2.4 Previous Work To ensure that our software is well integrated with the firm’s processes and to avoid replicating previous

work, we researched past projects that interface with the firm’s systems. After speaking with our project

sponsor and advisors we found that Thread 2 is a ‘greenfield’ in that there is no prior work or research

on this project. We were able to find the previous WPI MQP project that is the basis for Thread 1. The

project is summarized below:

Wall Street: Engineering Investment Profit & Loss Reporting Pipeline

This MQP team from 2017 was tasked with automating and streamlining an internal report built by Scott

Burton called the ‘Winners and Losers’ report. The report was originally manually populated by

members from various departments. As a result, producing reports took too much time. The project

aimed to limit these points of failure with a software system which would automatically retrieve data to

Page 15: Transaction Validation and Analysis

5

populate and format the report. This data was retrieved from the Geneva accounting system with the

use of a custom script. The data in this form is called a ‘Geneva extract’ and is in CSV format. CSVs are

easily manipulated using Python and Excel.

The team built a program entirely in Python and relied on OpenPyXL to interact with Excel. Excel was

used because it is already used and well understood at the firm. The previous MQP project worked to

develop the capability to add or remove columns. The report states that a regular expression is used to

update cell references after a new column has been added. The report describes this feature as working

but mentions difficulty in implementing it.

These difficulties included:

• Manipulating formatting could result in ‘broken headers’ , which causes the file to be corrupted.

The report does not make it clear if this corrupted file is the XML or the Excel file.

• After updating the cell references through the use of regular expressions, the number in a given

fund name would auto increment in the Excel sheet. This problem was overcome by including

more underscores in the naming of the funds in the sheet (instead of FUND_CAT50,

FUND_CAT_53)

• Python’s duck typing made it difficult to ensure that the data types used in the Excel sheet were

allowed. OpenPyXL and the program developed by the MQP team both implemented checks to

ensure only proper typing was allowed in the Excel sheet.

2.3 Software Development Environment

2.3.1 Python

Python is an open source and object-oriented programming language which has emerged as the

standard for data science in recent years. Python has an extensive standard library and is considered a

high-level language. In addition to its standard library, the open source community has built numerous

installable packages to expand the language’s functionality (Python Software Foundation, 2020).

The firm recommended the team to use Python as they use it in many of their existing programs. Python

allows developers to utilize packages, such as Pandas, to simply perform data manipulation over large

data sets. In addition, Python also has additional statistics libraries like SciPy that can simply implement

statistical analysis on a data set. As a result, we selected to use Python as our backend programming

language.

2.3.2 Pandas Pandas is a Python package designed to manipulate and manage data sets. Pandas uses Data Frames to

hold information. These frames are two dimensional and stored in local memory. In addition, these

frames are analogous to Excel spreadsheets in structure. Since data is stored in this rigid structure,

Pandas is commonly used with relational SQL, CSV, or TSV databases.

Pandas data structures are faster than native Python structures to manipulate large datasets. When

using Pandas, joins, unions, merges, and other data manipulation functions are simply performed with

few commands on a Data Frame. Additionally, Pandas is widely used, and documentation is readily

available. Finally, the firm already uses Pandas in many of their projects and recommended us to use it

(Pandas, 2019).

Page 16: Transaction Validation and Analysis

6

2.3.3 Anaconda Anaconda is an open-source distribution used primarily for Python and R that makes development easier

to manage and deploy packages. Anaconda’s package version management system is called Conda.

When we were onboarded to Thread 1, we realized the project was already using Conda to manage the

project’s package versions. Since the thread did not require us to use any additional packages and kept

our packages consistent across our local computers, we decided to use Anaconda.

2.3.4 Apache Spark and Databricks Apache Spark is a big data and machine learning analytics engine. Spark SQL aids in structured data

processing and is a module of Spark. It provides users with Data Frames and organizes data into rows

and named columns; Data Frames are a programming abstraction that organizes data like that of a

relational database. Spark acts as a distributed SQL query engine that manages logically interrelated

databases over a computer network (Databricks, 2019).

Databricks is a development platform that is optimized for and integrated with Microsoft's Azure cloud

services platform. It is based off Apache Spark and provides streamlined workflows and an interactive

workspace to increase collaboration between business analysts and data engineers. The Azure Data

Factory enables raw or unstructured data to persist and be stored in Azure Data Lake. Databricks can

then read data with Spark using Spark SQL. In addition, Databricks is integrated with Power BI to share

analytics, insights, and visual representations of data quickly and easily via Spark (Microsoft Azure,

2019b).

Since Thread 2 data was already stored in an Azure Data Lake, the firm requested that we use Apache

Spark and Databricks to access and manipulate the data. Regardless of the firm’s requests, Apache Spark

Clusters allowed for easy integration into Power BI. Although Spark does have its own version of Data

Fame manipulation like Pandas, Spark Data Frame documentation was limited compared to Pandas

documentation. In addition, we used Databricks because of its simple integration with Azure and Spark.

While other Python notebook development environments such as Jupyter exist, Databricks is integrated

with the Azure system far more than Jupyter. In Databricks, users can easily access the Azure data tables

for reference and then switch back to coding all in the same program.

2.3.5 Power Business Intelligence (Power BI) Power Business Intelligence (Power BI) is a Microsoft application that enables consumers, analysts, and

developers to transform data and convey key insights via dashboards and reports. It enables users to

connect to their data from Excel spreadsheets, the cloud, or its own hybrid data warehouses to visualize

and share these insights with others (Microsoft, 2020).

Overall, Power BI consists of three main components - the Power BI Windows Desktop application, the

online Software as a Service (SAAS) known as Power BI Service, and mobile applications for Android and

iOS devices. The Power BI desktop connects to data sources, shapes and models data, integrates with

Python, and implements RLS (row-level security) so users are given the proper access to restricted data.

The report can then be published to the Power BI service and share with end users who have access to

the Power BI Service and mobile devices. These users can then view and interact with the data and

insights.

Page 17: Transaction Validation and Analysis

7

Power BI is one of the largest and fastest growing applications that implements cloud computing for

business intelligence. The application allows flexibility for developers to implement the Power BI API

into their own applications and can extract data from a variety of data storage locations. It allows them

to indicate user privileges so timely reports can be sent to the correct parties. Moreover, Power BI

Report Server allows companies to deploy the application behind their firewall, in the case that they do

not store data in the cloud.

Although there are other data visualization programs such as Tableau, the firm requested that we

remain within the Microsoft suite for seamless integration between Microsoft programs. In Power BI,

the user can stream data from Spark tables without having to load the entire dataset into Power BI

which would significantly increase the size of the program and slow down analysis.

2.3.6 Microsoft Azure Data Lake

Azure Data Lake is a Microsoft product that enables users to store data of any size. The Azure suite can

also run programs and processes in languages such as SQL and Python. Additionally, it works well with

big data technologies such as Spark and Hadoop.

The firm recently moved their data to a Microsoft Azure instance because they wanted to move their

data to the cloud instead of locally storing it on internal servers (Microsoft Azure, 2019a). In addition,

they wanted to experiment with data analysis using tools such as Power BI and Databricks.

2.3.7 Project Management Tools

Airtable

Airtable is a cloud-based project management tool centered around modular shared tables. We used

Airtable throughout our entire project to manage our Agile implementation. All parts of our Agile plan,

from Sprint planning to Sprint Review, were included in this tool. These tables can be linked together in

order to tie User Stories to Themes and Themes to Epics. This software also allowed for easy export of

User Stories, calculation of Story Point totals, and totaling of hours worked. Airtable also facilitates

different views of these tables. The tables can be viewed as a Kanban chart or as a table grouped by

Sprint.

Although Trello provides functionality to keep track of tasks with a Kanban user interface, it does not

have alternative spaces like in Airtable. These alternative spaces are lists that contain information on

sprints, risks, and multiple aspects of Agile within one platform. Airtable’s broader feature set in

comparison to other free alternatives made it the right tool for the job.

Communication Software

Throughout the duration of the project, it was essential to communicate with our WPI team members,

our project sponsor, our advisors, and the accountants to discuss project requirements and deliverables.

We utilized software tools such as Skype, Outlook, and Slack to facilitate these conversations.

We used Skype to send messages in real-time to our project sponsor and other employees to confirm

meetings, clarify details, and ask questions that required minimal explanation. This platform was used

internally at the firm, so it was the logical choice for our team to use it when on site.

Page 18: Transaction Validation and Analysis

8

We used WPI and the firm’s Outlook accounts to send important files to our project sponsor, our

advisors, and the accountants. Additionally, we used Outlook‘s calendar feature to view their availability

and schedule meetings through Outlook invites. Like Skype, Outlook was integrated into the firm’s

culture, so we used it to communicate.

Furthermore, we utilized Slack to communicate. Slack is an online instant messaging platform designed

for project communication. Messaging is organized into channels, so we were able to discuss various

project-related topics in parallel. Since instant messaging requires less time and formality than email,

the platform encourages constant communication which may enable more efficient execution of project

goals. Slack was used to discuss details pertaining to the project, especially when we were on campus at

WPI or working remotely when could not be at the firm’s office.

2.3.8 Source Control Source control enables modern software development a system for integrating the individual edits to a

code base made by each member of a team. The firm implements Git through Azure Repos, and our first

thread used Git. Git is an open source version control system which has well defined procedures for use:

First, a repository is made and then each member edits that repository. After users make edits, they can

then issue pull requests which allow other users to review the code before it is pushed into the main

branch. This process continues for all edits.

Most development was performed in notebooks for the Azure Validation Dashboard. These notebooks

could be edited by two users at the same time like a Google Document. Like online document tools, the

Databricks notebooks had an integrated version history which largely eliminated the need for traditional

version control systems.

Page 19: Transaction Validation and Analysis

9

3. Methodology

3.1 Project Management Modern software project management is a rapidly evolving and diverse field consisting of a range of

processes and methodologies. Broadly, software projects follow the six activities of software

development: Analysis, design, implementation, testing, deployment and maintenance. This is known as

the Software Development Lifecycle (SDLC). These activities must be executed to successfully develop a

piece of software, however there are many ways of executing these steps at varying intervals and

durations. These different forms of execution are called development methodologies.

3.2 Choosing a Methodology Our team of three used the Scrum implementation of the Agile software development methodology.

There are many different ways to approach organized software development. Waterfall is a rigid

methodology wherein the development team focuses entirely on each development stage for a set

period of time. In this method, requirements are set at the start of the project and do not change. Users

do not have input during the development process beyond the requirements gathering phase. The

Waterfall Methodology follows the entire SDLC over the full length of the project, while Agile loops

through requirements gathering, analysis, and deployment in a series of rapid iterations (Radack, 2009).

Parallel development can be much faster and was created to address some of the time concerns of

Waterfall development. A drawback of Waterfall development includes the inability to rapidly adapt to

changes in project requirements. Parallel development involves splitting the project into multiple

subprojects which are designed and implemented by smaller teams. This was not an optimal

methodology for our project due to its procedural nature and limited number of team members. A

comparison of methodologies adapted from Systems Analysis and Design 6th Edition By Dennis, Wixom,

and Roth is shown below:

Agile Scrum Agile Kanban Waterfall Parallel V-Model

Unclear/Changing Requirements

Good Good Poor Poor Poor

Complex Systems OK OK Good Good Good Reliable Systems OK OK Good Good Excellent Self-contained Projects

Good Poor Good Good Good

Short Time Schedule

Excellent OK Poor Poor Poor

Schedule Visibility Good Excellent Poor Poor Poor

Figure 3.0 Methodology Comparison Chart (Dennis, Wixom, & Roth, 2015)

We found Agile to be the most suitable methodology due to a combination of factors. First, we only had

seven weeks to rapidly create and deploy a product which met the customer’s business requirements.

Page 20: Transaction Validation and Analysis

10

This ruled out the Waterfall based methodologies such as V-Model and Waterfall which are slower and

offer little opportunity for end user feedback. Additionally, we worked on site and regularly

communicated with our project sponsor regarding needs and requirements; so it was easy to loop

feedback into the product quickly.

Also, the firm, along with most modern software teams, uses a form of the Agile methodology in their

software development teams. Therefore, Agile meshed well with the firm’s existing processes.

Agile was created in 2001 by a coalition of members from multiple different methodologies. Members of

these different methodologies developed the Agile manifesto which is a description of principals that

should be adhered to when developing software (Beck et al., 2001). Since the writers of this manifesto

each championed their own methodologies, Agile is more of an overarching collection of ideas which

binds together many other project management methodologies such as Extreme Programming, Scrum,

Adaptive Software Development, and more. This manifesto and the sub methodologies within

emphasize the importance of customer satisfaction through early and continuous delivery of valuable

software. This delivery is usually on the time scale of two weeks. Agile also stresses the importance of

daily collaboration between businesspeople and developers. This collaboration is most efficient when

information is communicated face to face. Agile also focuses on the autonomy and empowerment of

individuals in selecting their own work, via the use of self-organizing teams. Finally, the best measure of

progress is the delivery of working software. By using Agile, our software was always in a condition to be

deployed (Beck et al., 2001). Our team was prepared to iterate quickly and develop efficiently with the

use of an Agile methodology. Agile has many sub-methodologies including Scrum. Each of these

methodologies have their own set of key activities and are useful for different teams and projects.

3.2.1 Scrum Scrum is comprised of roles, activities, artifacts, and rules. In our implementation of Scrum, one

individual was both the Scrum Master and Product Owner. The Scrum Master is the servant leader of

the software development team. This role is responsible for clearing blockers and providing process

leadership. The Product Owner is the central voice of the Scrum team and is usually more business

oriented. This individual defines what to do and the order in which to do it (Rubin, 2013).

Finally our software development team consisted of two individuals forming a self-organized team to

execute the development of the Epics. Because of the small size of this team, we assumed cross-

functional roles. All members were expected to contribute to the software development in this project.

However, well defined roles and responsibilities ensured that leadership and initiative were taken

quickly to guide and motivate the team, preventing decision paralysis.

Overall project tasks are built around a hierarchy of Epics, Themes, and User Stories. Epics are the

overarching large goals in a project. Epics would never be used in Sprints because they are very large

and not detailed (Atlassian, 2020). The team starts with Epics and through a series of conversations with

users and stakeholders is able to further refine subcategories of these Epics. These subcategories are

called Themes. Themes are an intermediary between the big picture Epics and small and specific User

Stories.

Page 21: Transaction Validation and Analysis

11

Figure 3.1 Product Backlog (Rubin, 2013)

User Stories are created by meeting with the users and having conversations to identify requirements.

User Stories can be written in multiple formats. Our team used the following format: As a (user) I want

to (feature) so that I can (outcome). Before each Sprint, User Stories were generated by the team. These

stories were usually derived from meetings held with users of the product in the prior week. Stories are

valued by the development team using the effort hours system (Rubin, 2013).

Scrum, like most Agile methodologies, is centered around Sprints. Although Sprints are usually two

weeks long, our Sprints lasted one week. We used shorter sprints because of the compressed timeline of

our project. After the Sprint Planning Meeting on Monday, we held daily Scrum meetings each morning.

Our project sponsor met with us daily and greatly assisted in guiding us throughout our project. During

these daily Scrums, also known as standups, each team member said what they finished since the last

daily Scrum, what they planned to complete before the next one, and any blockers which may have

inhibited their progress. At the end of each Sprint, the Scrum Master determined the Sprint velocity and

if the goals of the Sprint were met. Additionally, we carried over any incomplete tasks to the backlog for

the next week (Rubin, 2013).

We used a Kanban visualization of the tasks for each Sprint. This visualization was in our project

management tool: Airtable. Using the Kanban visualization, the team could easily see the project

backlog, Works in Progress, and completed tasks. This visualization also made it possible to move tasks

between these categories.

Page 22: Transaction Validation and Analysis

12

After each sprint we reviewed our progress on a team and project level. This was performed using Sprint

Review and reflection meetings, respectively. During sprint review meetings, the team determined

which user stories were completed. Also, during this meeting, the scrum master calculated and

presented the team’s velocity. This gave us an understanding of our overall progress for the sprint and

opened conversation regarding the project risks. Also at the end of each sprint, we demoed the

application to our sponsor. The demo served to show our sponsor what progress had been made during

the week (Rubin, 2013).

The sprint reflection or retrospective was performed at the end of each week to continuously improve

the team’s development process. During each retrospective, each team member discussed what they

thought went well, what could be improved, and what they would commit to improving during the next

sprint (Rubin, 2013).

Finally, burndown charts are a sprint artifact which is used to determine team productivity and work

pace. The burndown chart can be created for user stories or hours worked per day or per sprint. A

representation of a burndown chart can be found below. Sprints or days are typically the x axis, while

work units are on the y axis. An ideal burndown chart is perfectly linear because the same amount of

work is performed each week (Rubin, 2013).

Figure 3.2 Burndown Chart Guide (Goncalves, 2019)

By implementing the Agile Scrum project management methodology, we were able to efficiently deliver

a product which met the needs of interested parties.

Page 23: Transaction Validation and Analysis

13

3.2.2 Risk Management In any project, risks may inhibit progress and make it difficult to complete the project to the client’s

specifications. A key component of project management is identifying and tracking these risks. Our team

built a risk management framework which included the name of the risk, a brief description, its

category, the probability it would occur at the time the risk was created, its current status, and

mitigation plans as seen below.

Figure 3.3 Risk Management Framework

The risk was described using the following format: (risk) may result in (risk outcome). Phrasing the risks

in this way gave consistency and clarity to our risk descriptions.

The categories of risks used were:

Technical – Risks involving working with the varying software technologies.

Organizational – Risks having to do with poor organization and project planning such as scheduling or

identifying requirements.

Human Capital – Risks which are related to the energy on the team.

External – Dependency risks, items that are beyond the control of the team, but would still significantly

impact the ability to deliver a final product.

The mitigation column listed steps the team would take to avoid this risk. These could be one or more

bullet-points. Our risk framework was modified, or added to, at the end of each Sprint. This framework

and process was based on the system recommended by The Project Management Institute (Lavanya &

Malarvizhi, 2008).

3.2.3 Requirements Gathering

Sprint Planning Meetings

Sprint Planning Meetings are designed to determine and value the upcoming Sprint’s User Stories. The

team’s plan was to follow the set Scrum standard as well as revise Epics and Themes if necessary.

Sponsor Communications

Daily Scrums

Daily Scrums are quick stand-up meetings designed to regularly inspect how the team is moving towards its

project goal.

Product Demonstrations

Product demonstrations consist of the user testing out the product and giving their initial impressions.

User Interviews

User interviews are conversations with the interviewee(s) about specific topics. Our intent was to learn

about validation methods and goals.

Page 24: Transaction Validation and Analysis

14

4. Requirements Gathering To obtain our project requirements and adhere to the Scrum methodology, we planned each Sprint’s

User Stories, scheduled regular sponsor communication, and conducted interviews with the firm’s

accountants.

4.1 Sprint Planning Meetings

The team performed a Sprint planning meeting at the beginning of each Sprint. These meetings were

designed to determine and value the upcoming week’s User Stories and revise the Epics and Themes.

4.2 Sponsor Communication

4.2.1 Daily Scrums

To maintain clear communication with the sponsor, the team conducted Daily Scrums each morning in

the project sponsor’s office. Each team member relayed what they had completed, what they aimed to

do, and any blockers that prevented them from moving forward. After the team discussed their

progress, the sponsor suggested improvements and new requirements to help us create new User

Stories.

4.2.2 Product Demonstrations

Throughout the project, the team scheduled demonstration sessions with the sponsor, so that the team

could receive feedback. The sponsor assessed the dashboard without team commentary to test whether

the product was intuitive, clear, and accurate. The sponsor’s feedback proved extremely useful as each

demonstration revealed what to keep, what to change, and what to eliminate.

4.2.3 Interviews

The project sponsor informed us that our project would assist accountants in their validation of monthly

reports. To practice user-centered design, we scheduled interviews with accountants. We aimed to learn

about their validation practices and workflow. Our interview notes can be found in Appendix C and D.

Accountant Interview 1

The team initially met with the project sponsor and two accountants: Firm Accountant 1 and Firm

Accountant 2. The team asked the accountants about their current validation system and how they

typically prioritize their checks. After the accountants showed the team their Excel processes, they sent

a follow up email listing their validation procedure. The notes from the interview can be found in

Appendix C.

Accountant Interview 2

The team then met with accountants Firm Accountant 1 and Brennan Canese and asked them to demo

the dashboard. The accountants liked the drill down feature, which allowed them to see detailed

information for each Strategy on the transaction level. They also revealed that the list of validation

checks needed to show the supporting transactions behind each check. They found it useful to display

human-readable commentary on gains and losses. This functionality was like the reporting capability

provided in the ‘IRR Analytic Report,’ which was created with manual processes in the accounting

department. The notes from this interview can be found in Appendix D.

Page 25: Transaction Validation and Analysis

15

5. Analysis

Our meetings with the accountants enabled us to gain insight on how the report generator and

dashboard could help them in their day-to-day tasks. The meetings helped us gather and establish

requirements so that we could create epics and themes to execute the necessary tasks.

5.1 Epics & Themes

Epic 1: Improve Winners Losers Report Generator

This Epic was focused on the improvement of the Winners and Losers Report generator. The sponsor

requested the ability to add a column to this report. Adding a column meant that the report needed to

be able to take another field as input, manipulate it for formatting, and paste it into a new location in

the Winners and Losers Report.

Epic 1 Theme

1. Update Documentation (Document W-L Reporting)

After the code was understood, the documentation could be updated. User stories which

pertained to the development and revision of documentation were added to this category.

Epic 2: Azure Validation Dashboard (Validator) The main goal of this thread was to deliver a dashboard which provided accountants and other

members of the team at the firm with the ability to view, manipulate, and understand financial data in

new ways.

Epic 2 Themes

1. Validate Data (Validate Data/Alerts)

The firm’s accountants perform validation checks on the firm’s transaction history. This

validation procedure flagged both material and immaterial issues in the data loaded into the

Azure Validation Dashboard. Additionally, any flags that were raised were supported with their

respective transaction information.

2. Present Interactive Raw Data (Power BI RAW/Explorer)

To provide an understanding of the base numbers for our calculations, we connected the raw

financial data in the data lake to an interactive view in Power BI. This raw data built confidence

in the accuracy of the analysis performed. The final system design used two tables for all

displays. These two tables were able to be viewed and filtered in their raw forms. Additionally,

developers could see all the raw data at Power BI’s disposal by viewing the data or model views

in Power BI.

3. Generate Performance Commentary (Summary Information/Perf Summary)

Human-readable commentary was generated to provide a more understandable narration of

changes in gross profit. This commentary concatenates the Deal Name, Strategy, and most

recent month over month difference in gross profit into an intelligible sentence. The user also

had the ability to drill through the performance commentary and view transactions which

contributed to notable changes in gross profit. The performance commentary was presented on

the Strategy level.

Page 26: Transaction Validation and Analysis

16

4. Integrate data lake to Power BI (Integration and Automation)

This theme involved the steps required to create a connection between the backend (Datalake)

and the frontend (Power BI). This connection caches all the data in Power BI via refresh in the

Power BI interface.

5. Design User Experience (User Experience and Design)

A large portion of this project focused on how to best display the data on the front end, so that

the user was informed but not overwhelmed. To do this, we created User Stories focused on

what individuals wanted to see and how they wanted to see the displays refined for future

releases.

6. Write Documentation (Documentation)

A goal of our project was to write code which could be maintained in the future. To do this, we

produced documentation to ensure that users and developers were well informed of the

capabilities and design of the dashboard system.

5.2 User Stories

Since Epic 1 was a continuation of a previous MQP project, most User Stories for Epic 1 consisted of

setting up our development environments, analyzing the code, running tests on the system, and

producing a tutorial video.

For Epic 2, we broke down our themes based on the different sections of the dashboard. We then made

User Stories for each theme. Each theme required User Stories that took place in both Databricks and

Power BI. Some User Stories focused on research, as we were not as familiar with some of the

technologies such as Power BI and Pandas. All User Stories are listed in Appendix A and throughout the

paper.

Page 27: Transaction Validation and Analysis

17

6. Design

In order to execute the two threads, we utilized Azure Datalake to store and maintain relevant data. We

also used Databricks with Spark to perform calculations and manipulate the data. Lastly, we worked with

Power BI to visualize our data and insights. We utilized specific design patterns to produce modular and

well documented code and developed multiple iterations of Databricks notebooks. Additionally, the

Power BI dashboard went through a series of top down design changes as the capabilities and

limitations of the programs involved became better understood by the team. To explain our design

choices and how the program is structured, we created a series of diagrams and descriptions. Then, we

explained how the user interface looks and functions.

6.1 System Architecture To query the data lake from Databricks, we had to understand the firm’s cloud infrastructure. As seen in

Figure 6.0, we learned that a script fetches raw transaction data from the Geneva Accounting System via

the Active Batch Scheduler and then prepares it to be stored in the Azure Data Lake. Then, additional

scripts convert the data into Delta tables which can be manipulated in Databricks. By loading the tables

into the data lake, Power BI can import the tables and display them as visualizations.

Figure 6.0 System Architecture Diagram

6.2 Data Flow Diagram (DFD) The following figures describe how data is processed and flows throughout the Azure Validation

Dashboard system. Three levels of detail are provided. The Context Diagram presents the process from a

high level with the entire system represented by one process which views financial performance. The

next diagram, level 0 goes into more detail on how the data moves between different systems and

Page 28: Transaction Validation and Analysis

18

external entities in the process. The diagram breaks out the front-end viewing processes, the back-end

data analysis processes, and summarizes the data which flows between them.

Figure 6.1 Context and Level 0 Diagram

Finally, the most detailed diagram is the level 1 diagram which introduces data stores. In this diagram,

one can see how data flows for the main processes and views in the dashboard. Most viewing processes

simply access locally cached data from the Power BI Datastore (D2). When all the data is refreshed

(process 1.0) the backend Databricks processes are triggered to run and update the data in Power BI.

The updates from the Geneva Accounting System (external entity) are currently scheduled by the firm.

Page 29: Transaction Validation and Analysis

19

Figure 6.2 Data Flow Diagram Level 1

6.3 Entity Relationship Diagram (ERD) To create the back-end table for the Power BI dashboard, the team needed to understand which tables

to access in the data lake. Figure 6.2 displays three tables that were accessed. The tables are not

connected on shared keys, but they are uploaded to the data lake using prebuilt scripts.

Page 30: Transaction Validation and Analysis

20

Figure 6.3 As Is Data Lake Entity Relationship Diagram

The figure below shows the additional tables that were generated from sections of the former tables.

Even though the generated tables default.results_and_flows and default.irr_timeseries contain data

from the other tables, they are not joined in SQL. Instead, we created them using Pandas Data Frame,

converted them to Spark Data Frame, and then uploaded them to the data lake. We created two tables

because results_and_flows analyzes the selected PeriodEndDate and the previous month, while

irr_timeseries examines the data from inception to the selected PeriodEndDate. By having these two

time ranges, we could set Power BI’s drillthrough functionality to exclusively show what transactions

contribute to validation checks pertaining to month over month changes.

Page 31: Transaction Validation and Analysis

21

Figure 6.4 New Data Lake Entity Relationship Diagram

Since the above figure does not show the relationships between tables converted into Pandas Data

Frames, we created an entity relationship diagram on the Pandas Data Frame level. For the

results_and_flows table, we merged reporting.irr_results and reporting.irr_mod_cashflows to show

summary values such as IRR, MOIC, and GrossProfit as well as the transactions that contributed to them

over the PeriodEndDate selected and the previous month. Many of the additional tables merged into

the central table are alert tables created from the irr_results and irr_mod_cashflows Data Frame. The

team had to merge all alerts into the central table because of the limitations of PowerBI. Although

Power BI offers powerful visualizations and useful functionality such as drillthrough and drilldown,

Power BI can only join tables on one attribute. Thus, to use certain features such as drillthrough for

alerts, we had to merge all alerts into one table.

Page 32: Transaction Validation and Analysis

22

Page 33: Transaction Validation and Analysis

23

Figure 6.5 results_and_flows Data Frame Entity Relationship Diagram

Instead of proving alerts like in Figure 6.5 for the selected PeriodEndDate and the previous month,

Figure 6.6 was used to calculate historical estimations based on data from inception to the selected

PeriodEndDate. These values include the mean, standard deviation, month over month change, and

linear regression estimate for the next PeriodEndDate.

Figure 6.6 irr_timeseries Data Frame Entity Relationship Diagram

6.4 Use Case Diagrams The following figure describes the use cases for the Azure Validation Dashboard. The three main use

cases are: Analyze Historical Performance, View Alerts, and View Commentary. These uses are featured

prominently in the user interface. Also illustrated is the drillthrough use case. Drillthrough functionality

Page 34: Transaction Validation and Analysis

24

is included in the View Alerts and Commentary use cases. This drillthrough routes the user to the Raw

Data page. This page is represented by the View Raw Data 2 Month use case.

Figure 6.7 Use Case Diagram

6.5 User Interface Structure Diagram We constructed a user interface structure diagram to show how each view is connected. Users start at

the home view which acts as a launch pad to see different analyses. When a user goes to view Alerts or

Commentary, the user can drill through on an entry in the tables and navigate to Raw Data 2 Month for

supporting data. When a user goes to the History screen, the user can select further analyses derived

from Raw Data ITD. Explanations for each screen can be found in the User Experience section. Another

version of this diagram that includes every alert page can be found in Appendix F.

Page 35: Transaction Validation and Analysis

25

Figure 6.8 User Interface Structure Diagram

6.6 User Experience

6.6.1 Home The home display of the Azure Validation Dashboard was designed to give the user a general overview

of what the program is capable of and what is within the three main uses of the program. The top of the

page displays the date for which the report was generated. In all instances, this date is automatically set

to the latest available date in the alerts table (results_and_flows). As seen in the figure below, the firm’s

logo is to the left of the date and to the right is the page name. The scroller lies below the date display

and provides a preview of the data in the commentary section. The three main functions are denoted by

large clickable panes which lead the user to their respective landing pages. These panes are titled

Commentary, Alerts, and History.

Page 36: Transaction Validation and Analysis

26

Figure 6.9 Power BI Home

6.6.2 Commentary The Commentary Page was designed to have all the capability of the IRR Analytic report. The IRR

Analytic is a report manually created by the accounting department each month which describes the

biggest positive and negative changes in gross profit across different regions and time periods on a per

deal (Strategy) basis. These gains and losses are described in easy to read phrases using the following

syntax: [Deal Name] [Strategy] [gross profit] [gain or loss]. For example: ‘Apple Computer (BRKT:0005)

1.3mm gain.’ The wording in the original report is ‘contributors’ and ‘detractors’ for the biggest gainers

and losers, respectively for a given region or time period. Our report has a table on the left for the

biggest contributors and a table on the right for the biggest detractors. These tables can then be filtered

on region and return period.

Figure 6.10 Power BI Commentary

*firm logo

Page 37: Transaction Validation and Analysis

27

6.6.3 Alerts The Alerts Page contains all the validation checks performed on the data. Each button leads to a

different alert type as described on that button label. On the right of the page, the number of Business

Units, Portfolios, Strategies and Sycodes are displayed. These numbers are shown in order to give the

user an understanding of what data was analyzed. As a user of this program at the firm, the user is

expected to know how many business units and portfolios exist. Therefore, if the numbers displayed on

this page are radically different than what is known, it is a sign that the program may have

malfunctioned.

Each alert page has a table with relevant identifiable information for that alert. All entries which were

flagged with that alert for the given month will appear in the alert table. Each alert entry has

drillthrough capability. This means that users can right click and select the drillthrough option to be

directed to the raw data page where they can view all the transaction level information for that flagged

row.

There are 17 alerts which are broken up into 6 categories. The categories, alerts, and descriptions are

listed below:

• Transactions

o RemMV Change, but no transaction: Outputs Strategy codes that have a remMV change

over the previous month and do not contain any significant transactions (any

TransactionType that contains the string 'buy', 'sel', and 'AccountingRelated').

o RemMV same, but transaction exists: Outputs Strategy codes that have no change in

remMV over the previous month and contain any significant transactions (any

TransactionType that contains the contains 'buy', 'sel', and 'AccountingRelated').

o Monetized Strategy code with Transactions: Identifies Strategy codes that have been

monetized, have a quantity of zero when transaction type is total terminal value, and

have any other transactions for a given month.

o Gross Profit Changed, but No Transaction: Identifies Strategy codes that have a

GrossProfit change over the previous month and do not contain any significant

transactions (transactions that contain the string 'buy','sel' and 'AccountingRelated').

o Gross Profit Same, but Transaction Exists: Outputs Strategy codes that have no change

in GrossProfit over the previous month and contain significant transactions (any

TransactionType that the string contains 'buy', 'sel', and 'AccountingRelated').

• IRR, MOIC Breaks

o Negative IRR Change, Positive MOIC Change: Identifies Strategy codes that have a

positive MOIC and Negative IRR change over the previous month.

o Negative MOIC Change, Positive IRR Change: Identifies Strategy codes that have a

negative MOIC and positive IRR change over the previous month.

o MOIC < 1, IRR Positive: Identifies Strategy codes that have a MOIC less than 1 and an IRR

that is positive.

• Missing Data

o Missing Begin Date: Sycode level analysis which determines if the begin date field is null

• Sycode

Page 38: Transaction Validation and Analysis

28

o SyCode Price Inconsistencies Across Portfolios: Identifies SyCode Price inconsistencies

across portfolios for a given month.

o Sycode: One-to-Many Strategies: Lists Sycodes that belong to multiple Strategies.

o Sycode Price Change Month Over Month: Lists SyCode month over month (MoM)

changes over the previous month (in any Sycode-StratCode pair), if a current and

previous month exist.

• Monetized

o Ongoing, but Listed End Date: Strategy Codes that are ongoing and have an end date

that is not the current period EndDate.

o Not Monetized and RemMV is 0: Identifies Strategy codes that have a

Total_Terminal_Value/remMV of 0 and are not monetized.

o RemMV Not 0, but Listed as Monetized: Finds Strategy codes that are monetized and

contain a non-zero remMV.

o Monetized, No Listed End Date: Identifies Strategy codes that are monetized and do not

have an end date.

• Strategy

o New Strategy Codes: Identifies Strategy codes that exist in the given PeriodEndDate, but

do not exist in the previous month.

Figure 6.11 Power BI Alerts

6.6.4 History All panes in the history page use the timeseries data table as their source. Additionally, all historical

analysis occurs on three metrics: gross profit, Internal Rate of Return (IRR), and Multiple of Invested

Capital (MOIC).

The timeseries view is designed to display 1-5 deals a time. To use this view, the Deal Name, return

period, and business unit are selected on the left. After this, the month over month changes in the three

metrics will be displayed in the table in the center of the page. Additionally, on the left the historical

values for gross profit, IRR, and MOIC will be plotted in separate charts. Below these charts the

Page 39: Transaction Validation and Analysis

29

minimum, maximum, average and projected values for each deal selected will be shown. See Figure

6.12.

Back on the history page, three other views aside from timeseries can be selected. Each of these views

shows a table which compares the selected metric to its historical average on a per Strategy basis. The

table is sorted by the absolute value of the difference between the mean historical value and the most

recent value of the metric. Each table holds additional identifiable information such as return period,

portfolio, business unit, and Sycode to assist the user in understanding where this data can be located

and what might explain the deviation between the most recent and average value.

Figure 6.12 Power BI History Timeseries

6.7 Design Patterns Python’s code needs to be organized so that it is easily readable, understandable, and well documented

for developers. In order to accomplish the tasks required for T

hread 1, we implemented the strategy pattern (Boyanov, 2016).

6.7.1 Strategy Pattern

The strategy pattern enables an algorithm or class behavior to be changed at run time. Strategy objects

are created for different strategies and the behavior of the context object depends on the strategy

object, which changes the algorithm that is run for the context object.

We also implemented the Strategy pattern during Thread 1 when we demonstrated how to add a

column to the output sheet. There are multiple levels of processing, but our final level of processing

determined how the final number or string should be displayed; the data in the output sheet was

processed and generated using a Strategy_mapping hashmap with keys such as in_millions and

monetized. The value associated with the key referred to a class that defined the logic for how that

Strategy is implemented. In order to add new columns and populate them with data in the correct

format, developers can reference the defined Strategy or create a new Strategy and reference that, as

we did with in_abs_millions.

Page 40: Transaction Validation and Analysis

30

7. Implementation

Pre-Qualifying Project Work Before starting work on site at the firm’s office in New York city, we prepared by performing background

research as well as speaking with the project sponsor on a weekly basis. Regular communication with

the sponsor helped us develop a preliminary understanding of the project. Also, these early

conversations helped us develop a project plan and identify basic project requirements. However, it was

difficult to conduct extensive research without access to the software environment.

Sprint 1

User Stories Completed: As a firm analyst, I want to add columns in the Excel template, so that I don't have to manually edit the report.

As a firm analyst, I want to delete columns in the Excel template, so that I don't have to manually edit the report.

As a firm analyst, I want to modify columns in the Excel template, so that I don't have to manually edit the report.

As a firm analyst, I want to populate the modified template with data corresponding to the column names, so that I don't have to manually input data into the report.

As a firm employee, I want to learn how to use the report system, so that accounting can manually produce reports.

Sprint Review

This week was spent entirely on our initial setup at the firm and Thread 1. After gaining access to the

codebase, we learned that adding or modifying columns in the report required only minimal

modifications to the code. The prior MQP designed the program with this functionality in mind. After

presenting this to our sponsor, he recommended that we revise the documentation to better describe

this functionality and modify the template by adding a column. We created a video tutorial and made

significant revisions and updates to the documentation to better describe this process. These changes

and the video were pushed to the GIT repository by the end of the week.

No User Stories were rolled over or left incomplete for this Sprint.

Story Points Completed: 104

Hours Worked: 116.5

Velocity: 89.27%

Sprint Retrospective Meeting

What Worked Well

• Our sponsor was happy to meet with us every morning which kept our communication clear and easy.

• The onboarding process was faster and smoother than expected.

Page 41: Transaction Validation and Analysis

31

• The team was able to adapt to fast changes in scope and direction as the unfolded for Thread 1.

• Subject Matter experts on Python and Azure seem to be somewhat available to assist us with this project.

• Coming into work early meant we had to spend less time commuting.

What Could be Improved

• No items were rolled over in the backlog. As we learned more about the scope of Thread 2, we

should have built a larger backlog.

• The development environment did take some time to set up which slightly slowed our progress,

this was expected and likely will not be an issue after this week.

• We need to improve how information is shared on the team regarding code changes and how

we can all collaborate on code.

• We need to be aware of when parallel work is needed, so everyone has the same base of

knowledge.

• We need to work on unifying syntax and procedures in code.

Sprint 2

User Stories Completed: As a firm analyst I want to know if a Strategy switched from being a gain to a loss or vice versa so I can

recognize performance changes which affect the overall fund.

As a firm accountant I want to know if any terminal values changed when there was no buying or selling

activity because this is indicative of incorrect Data copy.

As a firm analyst I want to know if any terminal values went to 0 over the last month so I am aware of

any new closed positions.

As a firm accountant I want to know if remMV values changed when there was no trading activity so

that I can check if the data is correct.

As a firm analyst, I want to see the biggest month over month change in IRR at the Strategy code level

over the last N months, so I can make an informed decision about investing.

As a firm analyst, I want to see the biggest month over month change in MOIC at the Strategy code level

over the last N months, so I can make an informed decision about investing.

As a firm analyst, I want to see the biggest month over month change in GrossProfit at the Strategy code

level over the last N months, so I can make an informed decision about investing.

As a firm analyst, I want to know the "Buy and Sell" transactions over the last month, so I can make an

informed decision about investing.

As a firm trainee, I want to find the difference in IRR over 1 month, so that I can learn how to use

DataBricks and interact with the DataLake.

Page 42: Transaction Validation and Analysis

32

Project Risks Sprint 2

Sprint Review

Early in the week we had some difficulty with a rapidly changing scope for Thread 2. Initially we

interpreted this project as the need to create a report from scratch (Monday). We spent time Monday

Sprint planning and planning the overall structure of how the report would be built using our existing

understanding of the firm’s database systems. Tuesday, we learned that we would be providing

validation checks on data as it is loaded into the database. These procedures would be run daily from

the start of each month to determine what data is in the Datalake and what still needs to be added for

the report at the end of the month. This week our sponsor gave us many tasks involving insights he

would find useful to extract from the database. Although slightly challenging without business context

or an understanding of the database structure, we were able to accomplish most of these tasks. We are

learning a lot about the firm’s development environment. Additionally, because there were so many

tasks, the parallel work issue has been resolved.

Tasks added to the backlog include finding strategies that have a total terminal value of 0 and have not

been monetized. This is because we were unable to deliver the User Story the exact specifications of our

sponsor. We are close to completion on this and it will be easy to finish quickly next week.

Story Points Completed: 152

Hours Worked: 230

Velocity: 66.09%

Sprint Retrospective Meeting

What Worked Well

• Weekly updates to advisors seem to be appreciated and will continue in order to ensure all

parties understand our progress and status on the project

• Databricks/Azure lake are easy to use because Python is user friendly and Python notebooks

make code very easy to debug.

• Communication with firm staff continues to be going well as we speak to other members of the

firm

• Less parallel work because of the wider range of tasks.

• More tasks given by sponsor which left our team more room to plan the project

Page 43: Transaction Validation and Analysis

33

What Could Be Improved

• Be more agile -- less time planning, more time doing.

• Find different places to work, others have arrived in the space we are working in and they do

not appreciate our chatter.

Sprint 3

User Stories Completed: As a firm accountant, I want to be able to see the largest difference in IRR between two months so that I

do not have to manually find it.

As a firm accountant, I want to be able to see the largest difference in Gross Profit between two months

so that I do not have to manually calculate it.

As a firm accountant, I want to know which Strat codes changed from ongoing to monetized from one

month to the next to understand which stratcodes affect overall fund performance.

As a firm dev I want to know if the total terminal value is zero because if it is it should be monetized.

As an accountant I want to be able to drill down in the raw files so that I can see where the data may be

incorrect.

As a firm analyst I want to filter down to specific funds so that I can perform more accurate validation

checks.

As a developer I want to figure out how this report to connect Data Frame from Databricks to PowerBI

and create tables out of Data Frame so that I don't have to manually create a report.

As a firm accountant I want to see data points that are outside a number of standard deviations from

what are normal so that I can identify extraneous data.

As a firm accountant, I want to see missing data (IRR, MOIC, GrossProfit, Total_Cost, Total_Sales,

Total_Terminal_Value) in irr_results and irr_mod_cashflows for a certain month, so that I fix them.

Sprint Review This week our scope and objective narrowed and remained consistent. We now know that we are

building a Power BI dashboard which will be used by accountants to check the validity of the firm’s

monthly financial data. Today we sent our project sponsor our initial prototype of this dashboard.

Although still in the early stage, we are now confident enough in the definition of our project to focus

our time towards one deliverable, which contrasts with the smaller missions of last week. Some of this

can be attributed to our experience in communicating with our sponsor. On the technical side, there is

still a lot for us to learn about the development environment. Understanding how to efficiently work

with large datasets in Pandas has been particularly challenging. We have had the assistance of Robert

Dreeke and Oren Efrati in helping us to understand how to create tables in the Databricks Database and

efficiently manipulate data in Pandas, respectively. Next week it was mentioned that we would be able

to meet with an accountant. In preparation we have developed a set of questions to better determine

what an accountant would like to see in a data validation dashboard. We are at the halfway point of

usable project time, the project at this stage appears entirely achievable in the next three weeks.

Page 44: Transaction Validation and Analysis

34

For next week we rolled over User Stories relating to biggest month over month changes in total cost

and total sales. These items have been added to the backlog for next week and will be started in Sprint

4.

Story Points Completed: 162

Hours Worked: 143.5

Velocity: 112.89%

Sprint Retrospective Meeting

What Worked Well

• Good communication with individuals in the firm who are not the Project Sponsor.

• Less time was spent planning, and more time was spent on pursuing User Stories, this greatly

improved our velocity

• Our team has started to understand how to use the Pandas package effectively in Python.

What Could Be Improved

• Show more work to our project sponsor in context. This helped our sponsors understanding of

our progress and the value of our work.

Project Risks

Project Risks Sprint 3

Sprint 4

User Stories Completed: As a firm accountant, I want to know the average change over any time period for MOIC, Gross Profit,

Total Cost, Total Sales, after specifying a Strategy so that I can understand changes in Strategies over

time.

As a firm accountant, I want to see the biggest month over month change in TotalCost at the Strategy

code level, so I can make an informed decision about investing.

As a firm accountant, I want to see the biggest month over month change in TotalSales at the Strategy

code level, so I can make an informed decision about investing.

As an accountant I want to be able to have a report that updates automatically, so I always have the

most up to date information.

Page 45: Transaction Validation and Analysis

35

As a firm accountant, I want to see if Strategies in funds with end dates are monetized, so that I can

determine why.

As a firm accountant, I want to know if a Strategy in a fund is monetized and whether it has no quantity

and no market value, so that I can determine why there is a notable change in the data.

As an accountant I want to see the biggest sycode move for any strat code so I can further analyze that

strat code.

As an accountant I want to see when MOIC and IRR are moving in opposite directions so I can further

analyze the story associated with it.

As an accountant I want to know when Gross Profit does not change and there are many transactions

As an accountant I want to know when RemMV changes and there are many transactions

As an accountant I want to know the month to month price changes for a sycode, so that I can see the

biggest moves in sycode price.

As a firm accountant, I want to check if there are begin dates for strategies, so that I can see why there

might be none.

As an accountant I want to see if a monetized portfolio has a terminal value OR RemMV which changes

from 0 to any number.

As a firm accountant, I want to see if strategies in funds with a terminal value of 0 are monetized, so that

I can determine why.

As a firm accountant, I want to see which strategies are new, so that I can determine which strategies do

not have previous data.

As a firm accountant, I want to see if a sycode belongs to multiple strategies, so that I can determine

how to override the data.

As a firm accountant, I want to see whether prices for sycodes changes across funds, so that I can see if

there were inconsistencies in the data.

Sprint Review

As a result of meeting with accountants Firm Accountant 1 and Doug Mackenzie on Tuesday, we were

able to further refine the needs of our future users. During the meeting we discussed which checks are

performed on the data, the order the checks are performed, and developed an understanding of the

priority of these checks. After the meeting, Firm Accountant 1 sent the Excel files currently used to

perform these checks. Using these sheets and the recording of our meeting we created an outline of all

the validation checks to be performed on the data.

Page 46: Transaction Validation and Analysis

36

Wednesday and Thursday, we developed functions to execute these checks. Thursday afternoon and

Friday were spent integrating and re-validating these checks. Due to difficulties with integration, this

took longer than expected, as a result we were unable to ship a revised dashboard Friday. We will work

to complete this Monday and will review it with the project sponsor. After this review we will have

another meeting with Evan and possibly other members of the accounting department to receive

feedback on the dashboard.

Multiple user stories were incomplete which prevented the dashboard from coming together at the end

of the week. Some of the stories we rolled over to next week related to the human readable

commentary and drillthrough capability. Drillthrough was confused with drilldown which resulted in a

false completion of a User Story.

Story Points Completed: 168

Hours Worked: 126.5

Velocity: 132.81%

Sprint Retrospective Meeting

What Worked Well

• Improved morale

• Realistic goals were set

What Could be Improved

• Better planning for integration, it took far longer than expected because of poor planning of

code.

• Re-use others work, there is no need to re-invent the wheel.

• Narrow scope to allow a finished product by the end of the week.

Project Risks

Project Risks Sprint 4

Sprint 5

User Stories: As an accountant I want to see human readable commentary on which Strategy codes influenced the

portfolio, and moved the most

Page 47: Transaction Validation and Analysis

37

As a firm accountant, I only want to see BKRT, so that I can make decisions on a more meaningful

dataset

As a firm accountant, I want to stratify the data by Region, so I can gain insights about the progress of

each region.

As an MQP student, I want to structure the final paper, so it accurately describes our work at the firm.

As a firm accountant, I want to drill through alerts, so that I can prove that an alert is valid.

As a firm accountant I want to be able to use filters on every page for common fields such as portfolio,

Business Unit, Strategy, Region, Sycode, ALERT ATTRIBUTE so that I can universally filter displayed data

As a firm accountant, I want to see projected values based on historical data like standard deviation and

linear regression, so that I can determine if my numbers are in a reasonable range.

As a firm accountant I want to see the Alert Description without the Extra linked column Visible so that

the view is less cluttered

As an MQP team member, I want to refresh and update our paper-omit previous technologies used and

write about the new technologies used.

As a firm accountant I want to be able to see The DealName as well as StrategyCode Because I know

Deal Names better than Stratcodes

As a firm accountant when Viewing the GP No-Change Transactions Rules I want to be able to drill down

to transactions

As a firm accountant I want Extreme IRR Values to be Filtered Out (Perhaps greater than 1000) Before

the Standard Deviation Is calculated So that I only see useful Data

As a frim trainee I want to meet with Users of the dashboard to better understand their needs

Sprint Review

This Sprint we showed our progress to our sponsor twice and had another meeting with two members

of the accounting team. These sessions were brief but helped us design the Power BI interface. We used

these meetings to hear direct feedback on the state of our dashboards. As a result of a meeting early in

the week, the data structure of our project had to be consolidated to enable the 'Drillthrough' feature in

Power BI. Also, as a result of this meeting we were informed of additional tables in the database which

specify deal-name and region. The accountants use these fields very often, so joining them to our main

table will make the data far easier to understand and manipulate.

During our last stand up with our sponsor this week we received feedback focused mostly on the

presentation of the data in Power BI. Usability is a key sponsor concern. Also, of note in this meeting is a

feature request to use portfolio as a filter in one of our pages.

Our team has concerns on the feasibility of implementing this capability because of the complexity it

would introduce in the back-end data processing. We will communicate our concerns at the start of

Sprint 6. Scheduling would be affected, and it has become a priority to minimize changes to the backend

for two reasons: First, we need to focus time on improving the existing Power BI interface and secondly

Page 48: Transaction Validation and Analysis

38

because changes to our processing and organizing of the data on the backend have the potential to

break our interface. We also started the outline and planning of our final paper.

Many of the items added to the backlog this week were larger formatting tasks which will not be

confirmed to be done until the project is nearly complete. For instance, when a new table is added in

PowerBI, grand totals are added by default. Until no more tables are added we cannot be sure that

there are extraneous and irrelevant grand total fields. This is also true of larger uniformity in formatting,

such as making all the headers the same.

Story Points Completed: 151

Hours Worked: 151.15

Velocity: 99.99%

Sprint Retrospective Meeting

What Worked Well

• Using the pair programming technique allowed for more effective collaboration and better

communication across the team.

• Splitting tasks and User Stories amongst the team has become easier and more natural.

• Working more in the front end is gratifying.

What Could Be Improved

• Estimating the time, a task needs to be completed and conveying that number to others.

Essentially performing a better real time story point allocation and communication when tasks

are in progress and perhaps running long.

• Communicating current objectives casually could be improved so that all members of the team

have a sense of direction.

• Planning Power BI usage for versioning and collaboration purposes is important because only

one person can edit it at the same time.

Project Risks

Project Risks Sprint 5

Sprint 6

User Stories: As a firm accountant I want to Disable Grand totals on non-Applicable Fields.

Page 49: Transaction Validation and Analysis

39

AS a firm accountant I want the ToolTip on Diffs to show the two values used to calculate the DIFF.

As a Project Sponsor, I want to see when a MOIC and IRR are different in my own terms, so that I can be

alerted when it happens.

As a firm accountant I want the Closed_fund_transactions field to be renamed as the

monetized_stratcode_with_transactions and to only check for non-null values in the RESID Column.

As an accountant, I want to see DealName and StrategyRegionofRisk as columns and as filters so I can

effectively analyze the data and utilize the PowerBI Dashboard.

As an accountant, I want to be able to filter on portfolio (including ALL), so that I can assess strategies on

a general level.

As an accountant I want to see the absolute value of all Diffs so that I can sort them.

As an accountant I want to see relevant usable filters on each report page so that I can filter the data

appropriately.

As an accountant, I want to just see values where the alert is true, so I see data respective to that alert.

As an accountant I want a Top Level Summary page that contains the data and a well organized way to

access alerts.

As a developer, I want to learn how to properly use the slicer to arrange data to the accountant's

satisfaction.

As an accountant, I want to see a flat list of transitions: raw data.

As a user of the PowerBI dashboard, I want the columns names to be to be easier to understand, so I can

better understand the data is represented.

As an accountant, I want to see a description of each page, so I understand how to use the data

provided and further understand the alert and its check.

As a firm developer I want to see commented Code so that I can maintain the software.

As a firm developer I want to add "changes" to IRR, MOIC, Buttons so that the RAW explorer is more

useable.

As a firm accountant I want to filter the entire Report by Investment type so that I do not see irrelevant

cash transactions.

As a project sponsor I want the headers of each page to be the same on every page so that there is

consistency in design.

As an accountant I want to see commentary for all all return periods 1, 3, 5 year to date.

As a project sponsor for each dealname I want to see the Average Min Max LR for IRR, MOIC.

As a project Sponsor I want to see a graph of time series data with IRR GP MOIC all in one visual.

As a project sponsor I want to see a descending sort of a diff between current value and average Value.

Page 50: Transaction Validation and Analysis

40

Sprint Review

This Sprint was focused on revisions to the user interface. After Wednesday no more changes were

made to the backend code, and the feature we were concerned about implementing last week was

added within our time constraints. This week we went from demoing once or twice per week to nearly

every day with our project sponsor. This compressed feedback loop let us make the many small changes

needed to improve the user interface much faster. These changes focused on formatting and the overall

flow of the user through the interface. A key challenge was providing enough information to summarize

performance, while not overwhelming the user, all while also giving the user transparency into how the

values were calculated. Towards the end of the week we received some informal feature requests over

email, these features were implemented by review time Friday. We plan to not develop any further

features after this week to stay on track. Our sponsor understands this and will be working with us to

assist in refining our presentation next week.

Story Points Completed: 185

Hours worked: 146

Velocity: 126

Sprint Retrospective Meeting

What Worked Well

• Our goals and expectations we achievable and realistic for the time we have left.

• Scheduling of essay allowed for early professor feedback

• Advisor feedback is positive, which is a good indication of project status.

What Could Be Improved

• We should try to avoid pursuing low clarity instructions without asking for more information,

because it is unlikely we will be able to meet expectations.

• We need to better communicate technical limitations of Power BI.

Project Risks

Project Risks Sprint 6

Page 51: Transaction Validation and Analysis

41

Weekly Burndown

Page 52: Transaction Validation and Analysis

42

Page 53: Transaction Validation and Analysis

43

8. Testing

8.1 Quality Assurance Procedure

For Thread 1, the team used the intermediary Excel file to determine if the numbers were correctly

displayed in the Winners and Losers Report. We found key values for TotalSales and their associated

Deal Names to do a quick check on the validity of the data. In addition, the sponsor verified that the

newly produced column in the report was correct.

In Thread 2, testing was more complex than quickly determining if numbers had been copied over. Since

accountants were one of our primary users, we attempted to use their accounting procedure and

former Excel Sheets to check if our validation alerts had produced similar information. When we tried to

compare our numbers, however, we realized that the accountants’ files had a series of overrides that

were futile to replicate. Our sponsor later told us to not use their numbers as we would waste time

implementing overrides. As a result, we had separate notebooks where we would redo different alert

entries using SQL queries instead of using Pandas. For example, to prove that a certain Strategy had a

GrossProfit change with no significant transactions in the given PeriodEndDate, we queried

reporting.irr_results to prove the GrossProfit change for the given Strategy. Then, we queried

reporting.irr_mod_cashflows to see that the Strategy contained no significant transactions with a

TradeDate within the time of the PeriodEndDate. In addition, we produced sanity check columns for

Power BI, so that we could see the inputs of certain calculations such as month over month changes.

8.2 User Feedback

The team scheduled regular demos with the sponsor to receive user feedback on the accuracy and

usability of the dashboard. These regular meetings allowed us to quickly see initial reactions from the

sponsor and write new User Stories. The stories would then be used to fine tune the user interface and

back-end accordingly.

The firm accountants demoed early versions of the dashboard during interviews, which can be found in

Appendix C and D. We had two major meetings with the accountants that revealed what they value and

how they validate their data.

In the first meeting, we learned about what suspicious activity should be flagged such as IRR and MOIC

movements in opposite directions. We also learned in this meeting how the accountants use a series of

interconnected Excel spreadsheets to flag alerts and generate the commentary for the IRR Analytic

Report. In addition, we got a glimpse of their workflow and what accountants prioritize when validating

data.

In the second meeting, the accountants interacted with a prototype of the dashboard and relayed their

first impressions. They initially did not like the validation section but were interested in the section when

we explained the drillthrough functionality. In addition, they liked the drilldown functionality in Power BI

because it was an intuitive way to navigate the large tables.

Page 54: Transaction Validation and Analysis

44

9. Future Work

9.1 Thread 1

9.1.1 Modularize Strategies Further

Although the current code base modularizes strategies for report customization, the strategies could be

further modularized in column_Strategy.py. The main concern is that the user can only call one Strategy

per header. These strategies are highly specific to headers, which means that creating one requires

intermediate coding knowledge. As a result, developing multiple, modular strategies that can be mixed

and matched (i.e. absolute value, in millions, in billions, in percentage etc.) will likely generate a simpler

user experience for generating custom columns. Keep in mind that some strategies may be so unique to

the data set that they cannot be easily modularized.

9.1.2 Modularize Pre-Processing Functions Further

While implementing the “All Other Positions” portion of the “Invested Capital Column” in

preprocessing_factory.py, we noticed that the file is similarly organized to column_Strategy.py.

However, the code that generates the “All Other Positions” is less modular because it is a helper

function called by the class ConcatLowerBPSProcessing(). As a result, creating multiple, more modular

functions that can be mixed and matched may allow for users to easily customize the template. Some

pre-processing functions may be so unique to the template that they cannot be easily modularized.

9.1.3 Determine User Base The current generator relies heavily on both the initial Excel Template and Python code. As a result,

potential users who are not familiar with Python and software development may have issues editing the

code base to suit their needs. Since the firm may ask non-technical employees to perform report

generation in the future, it is key to determine who will be using the software before further

development. This determination will dictate how to develop the software in a manner that is easy and

appropriate for the user base. Depending on the user base, a solely Excel or Python implementation may

be needed.

9.2 Thread 2

9.2.1 Add More Timeseries Data to Datalake As of December 2019, the data lake only contained PeriodEndDates from mid 2018 to late 2019. To

develop the dashboard, we examined 8/2019 and 7/2019, because 9/2019 and 10/2019 did not have as

much data. As a result, adding more data would allow for a recent analysis of the latest PeriodEndDate

and accurate historical analysis. In addition, more data could be used to train a machine learning model

and perform further analysis.

9.2.2 Schedule Script We designed the dashboard to support accountants in their validation of the latest PeriodEndDate. To

put the Azure Validation Dashboard into production, we recommend that the firm run the script for the

latest PeriodEndDate using the PeriodEndDate selector widgets in Databricks.

Page 55: Transaction Validation and Analysis

45

9.2.3 Add More Alerts and Analysis As of December 2019, we built 17 alerts into the Power BI dashboard. We also laid the foundation for

others. For example, we calculated the linear regression prediction of GrossProfit, IRR, and MOIC for the

latest PeriodEndDate. An additional alert could be designed to find the difference between the actual

value versus the predicted value for the latest PeriodEndDate.

9.2.4 Add More Fields to Data Frame After working in the data lake, we realized we only used a small portion of the many fields in the tables.

The fields in the current dashboard were required by our sponsor, but we imagine that even more

analysis could be performed if more fields were introduced. In addition, we recommend adding all the

ReturnPeriods in the irr_timeseries script as we only included ITD and YTD at our sponsor’s request.

9.2.5 Create Summary Page Although we created a wide range of alert reporting pages, these functions are not prioritized or readily

accessible when opening the dashboard. In its current state, it would take at least 35 clicks to view every

possible alert. A streamlining of the user interface is necessary to improve the workflow and reduce the

amount of time required to view alerts. A future redesign could reduce the number of interactions

required to see the most important alerts. This is a considerable challenge due to the rigidity of

designing in Power BI and because assigning priority to each alert will require a deeper understanding of

their importance.

Page 56: Transaction Validation and Analysis

46

10. Learning Assessment

10.1 Challenges

1. Identifying Requirements

Before we started our project in New York, we had a general idea of what we had to do. Due to security

reasons, we were not able to see how the data was structured until we got to New York. We understood

that we needed to edit the previous MQP’s code and develop new ways to log and analyze data, but

many of the details were not clear. When we arrived in New York, we realized that some of our

requirements had changed. In Thread 1, we planned to upgrade the report generator by implementing

XML, but we quickly realized after looking at the code, that the previous MQP team had already

implemented a Python package utilizing XML. In Thread 2, we were tasked with completing one out of

the three sub threads that we had planned. Additionally, some of the requirements such as using Power

BI as the primary front-end tool was not clearly established until mid-way through the project.

In response, we attempted to clarify requirements with the sponsor and engaged in conversations on

what we had to do. Although the conversations gave the team new insights, these insights would

occasionally conflict with other requirements. Eventually, by having regular product demonstrations

with the sponsor along with an agile mindset, the team was able to determine requirements, gain

actionable feedback, and move forward.

2. Planning VS Execution

During the second Sprint, the team planned the project after getting the initial overview of Thread 2. We

created diagrams and wrote User Stories for four hours. As we began to execute our plan, our

requirements rapidly changed mid-sprint, and much of our planning was not applicable to the project.

On the flip, the team began to develop the back-end tables without considering the limitations of Power

BI. Overall, the team was challenged to find the balance between planning and executing.

After experiencing both extremes, the team realized that shorter planning and execution cycles with

daily feedback was most effective. By receiving our sponsor’s reactions on smaller chunks of our user

interface and back-end code, we were able to align ourselves more with our sponsor’s needs.

3. Domain Knowledge Although the team had some financial literacy and a rough idea of the firm’s asset organization, we

struggled to understand the entirety of the system. Different in-house column headers frequently

confused us as we worked on the back-end structure. Even though our sponsor clarified many terms for

us, we did not interact with most of the columns in the datasets. Although many of the columns were

not relevant to the project, we frequently wondered if we were missing information. Towards the end of

the project, we added Deal Name to our Data Frames since our sponsor requested it. While the task was

easy to complete, the field was stored in an obscure table that we would not have found on our own.

4. Optimization

Despite having some experience with the Pandas Python library, the team had to research how to use

the library correctly. Initially, the team used for-loops to analyze the data. However, we quickly learned

that Pandas was created for vectorization. When trying to optimize our functions, we attempted to learn

best Pandas practices, but we did not fully understand the program. We eventually asked for help from

Page 57: Transaction Validation and Analysis

47

a software developer at the firm, and he showed us the groupby function and the apply function. By

using these functions, we were able to analyze large chunks of data in a shorter amount of time.

10.2 Learnings

10.2.1 Computer Science

Technologies

Throughout the project, the team learned how to adapt and use the firm’s technologies. These

technologies included the Azure Data Lake, Databricks, Pandas, and Power BI. While the team had

familiarity with Python and SQL, we had programmed and interfaced with the data lake and Power BI.

By speaking with employees of the firm, we were able to ask about the company’s best coding practices,

development setup, and advice on how to write in Pandas and Databricks. The team, however, did not

have as much support when working in Power BI. We relied on YouTube tutorials, Microsoft

documentation, and experimentation to develop the final deliverable.

Optimization

Vectorization and GroupBy

During development of various Data Frames, we understood our commands had to run relatively quickly

and utilize the matrix feature of Pandas. At first, we used for-loops and Pandas’ version of for-loops to

iterate through the large matrices. Although our for-loop code produced accurate results, the

commands were relatively slow for large datasets. A software developer at the firm suggested using

different groupby techniques to apply functions on an entire column or group as opposed to rows.

When these techniques were implemented, the commands cut our run times exponentially. As a result,

the team coded with vectorization in mind and structured the Data Frames with temporary columns to

allow for quick calculations and analysis.

Query Optimization

Each team member worked individually on different alerts, so each person wrote their own SQL queries.

When merging the team’s code together into one notebook, we realized the commands took a

considerable amount of time. After running some tests on the code base, we learned that some SQL

queries took minutes to complete, while Pandas commands executed in a tenth of a second. As a result,

the team extracted their SQL commands and made four relatively large Data Frames at the beginning of

the program to be shared amongst the different alerts. By completing this task, we significantly cut

down our run times.

Integration

Midway through the project, our goal was to link our alerts table with the raw data table in Power BI.

We discovered that Power BI could only join tables on one field with a 1-to-1 relationship and would

allow for drillthrough only on that one key. As various alerts needed to be joined on different sets of

keys, we realized we had to re-design our entire Data Frame. At first, we tried creating a unique

identifying key for each row, but we realized that this system would not be able to provide enough

context for drillthroughs. In a similar manner, we then tried to create a column for each alert type in the

raw data table. We also entertained the idea of creating a customized raw data table for each alert type.

While the idea might have worked in Power BI, we quickly dismissed the idea because of the lack of

extensibility. Eventually, we realized we had to merge our alerts table into our raw data table. We

refactored our commands to allow for the merge, and by doing so, we were able to avoid the Power BI

Page 58: Transaction Validation and Analysis

48

join process and allow for immediate drillthrough. Because of this experience, the team learned to be

agile after many failed attempts and aware of the limitations of integration with another program.

10.2.2 Project Management

Working with clients

Throughout this project, we learned how to work with clients. The process of translating abstract ideas

into concrete business and functional requirements in the real world is very different than any

experience in a classroom. Using a variety of techniques, our team refined our ability to ask the right

questions and determine what the client and end users were really interested in. As we became more

familiar with the software environment, data structures, and financial terms, it became easier to identify

the needs of the client.

Additionally, we learned that taking good notes even during small interactions with our project sponsor

helped us keep a good record of feedback. This allowed us to triangulate a solution from all feedback

with a bias towards what feedback was most recent. If we only pursued what was mentioned at the

most recent meeting, as we often did early in the project, we would start many items, complete few,

and overall set difficult-to-achieve goals.

We also learned to speak in the terms of the user. As we became acclimated to the business

environment of the firm, we picked up on many of the terms used in the industry to communicate key

information. By learning the definitions of these terms, we began to have far more productive

conversations when gathering requirements for future versions.

Finally, we found that stating our interpretation a sponsor’s directive and asking if we were correct, was

a productive way to determine if we understood what was communicated. This technique allowed us to

catch any misunderstandings. Getting confirmation early and often was a constant theme throughout

the project.

Iterative development with feedback

Throughout our project experience, it was apparent how important it was to receive feedback on

prototypes quickly and consistently. A large portion of the early project work was spent focusing on

back-end development with minimal feedback from users. Once we developed an initial prototype, we

were able to make greater improvements to our overall product once the user had the product in their

hands. Although this experience confirms the value of user input - a main tenant of the agile

methodology, our project was developed with less initial user input because of other factors. The main

factor was the connection between the user-interface and the back-end: Due to the way Power BI

connects to Databricks, the table we exported from Databricks had to have the same name and header-

names, or every visual would need to be rebuilt. We spent a considerable amount of time understanding

this connection.

10.3 What we would do differently

1. Determine needs of client – priorities and whether it is a want or a need To begin, we could have improved the methods by which we gathered requirements for the Power BI

dashboard. We were able to meet with the sponsor and the accountants on separate occasions to get

their feedback related to our product; however, we spent hours creating and fine-tuning features for the

dashboard that were later discarded. While meeting with our client, we tried to avoid this issue by

Page 59: Transaction Validation and Analysis

49

prioritizing features based on accountant feedback. Instead, we could go about gathering requirements

by asking the accountants which features were “wants” versus “needs.”

2. Establish capability of tools with client The team learned that it is essential to communicate technical limitations when developing project for a

client. Although we were not experts in any of the technologies used, we gained experience, and it

became clear that some functionality desired by the sponsor would be either impossible or very time-

consuming to develop. As a result, we would convey the limitations of the tools to the sponsor early in

the design process and thus close the expectations gap between the team and the project sponsor.

3. Testing Midway through the project, we received a set of validation files from the firm’s accountants, however,

we were advised by our sponsor to not use their numbers for tests. The team learned that the

accountants applied many complex and nuanced overrides that would have taken too much time to

replicate. As a result, we did not have a set of ground truths to test our code with. Instead, the team

tested alert calculations by running independent SQL queries. If we were to do the project again, we

would put more priority on asking for usable tests. The lack of an official ground truth created some

confusion for the team and thus slowed down development.

4. Team Communication Towards the last few weeks of the project, the team had to focus on developing a testable Power BI

dashboard involving various tasks and thus work longer hours. During this time, there was a general

concern about how long we would stay at the office. Although we agreed to work on certain items until

they were finished, we knew that we needed to set time expectations with one another. In retrospect,

we would have established more expectations regarding how long to stay at the office and proactively

establish the priority of certain tasks.

5. Technical Mentors Throughout the project, we met with several firm employees who gave us coding tips, set up tutorials,

and provided feedback on our dashboard. Each time we met with them, we learned how to approach

problems in new ways and gathered clear project requirements. As a result, we feel that having more

conversations with firm team members would have benefited the team greatly and may have increased

our productivity.

Page 60: Transaction Validation and Analysis

50

11. Conclusion

While at the firm, the team improved the Winners and Losers report generator and developed an Azure

Validation Dashboard. By adding documentation to the Winners and Losers report generator, we were

able to help future firm employees maintain the code base. By building the Power BI dashboard, we

provided the firm’s analysts with robust and transparent calculations in a cloud-independent

environment.

Although we faced many challenges such as identifying requirements, planning appropriately, learning

domain knowledge, and optimizing our code base, we were able to overcome them by planning with the

end user in mind, iteratively developing with regular feedback, and learning powerful new tools.

At the end of the project, we were able to present our deliverables to our sponsors and exceed their

expectations.

Page 61: Transaction Validation and Analysis

51

Works Cited

Appelo, J. (2010, October 26). Agile Goal Setting. Retrieved from https://www.infoq.com/articles/agile-

goal-setting-appelo/.

Atlassian. (2020, January 3). Atlassian Documentation. Retrieved from

https://confluence.atlassian.com/.

Beck, K., Beedle, M., Bennekum, A. van, Cockburn, A., Cunningham, W., Fowler, M., … Thomas, D.

(2001). Manifesto for Agile Software Development. Retrieved from https://agilemanifesto.org/.

Boyanov, A. (2020). Python Design Patterns: For Sleek and Fashionable Code. Retrieved from

https://www.toptal.com/python/python-design-patterns.

Databricks. (2019). Apache Spark. Retrieved from https://databricks.com/spark/about.

Dennis, A., Wixom, B. H., & Roth, R. M. (2015). Systems Analysis and Design, 6th Edition. Hoboken, NJ:

Wiley.

Eriksson, D. (2016). Compliance for Hedge Funds. Retrieved from

https://thehedgefundjournal.com/compliance-for-hedge-funds/.

Garnick, N., & Klein, A. (2019, May 29). [Hedge Fund Company] Raises over $2.75 Billion for Most Recent

U.S. Real Estate Fund.

Gonçalves, L. (2019, September 1). Burndown Chart - The Ultimate Guide for every Scrum Master.

Retrieved from https://luis-goncalves.com/burndown-chart-ultimate-guide/.

Gupta, D., & Moore, K. (2019). Finite State Machines. Retrieved from https://brilliant.org/wiki/finite-

state-machines/.

Hayes, A. (2019, June 3). Internal Rate of Return – IRR. Retrieved from

https://www.investopedia.com/terms/i/irr.asp.

[Hedge Fund Company]. (2019a). About.

[Hedge Fund Company]. (2019b). History.

Lavanya, N. & Malarvizhi, T. (2008, March 3). Risk analysis and management: a vital key to effective

project management. Retrieved from https://www.pmi.org/learning/library/risk-analysis-project-

management-7070.

Microsoft Azure. (2019a). Data Lake. Retrieved from https://azure.microsoft.com/en-au/solutions/data-

lake/.

Microsoft Azure. (2019b, May 7). What Is Azure Databricks? Retrieved from

https://docs.microsoft.com/en-us/azure/azure-databricks/what-is-azure-databricks.

Microsoft. (2020). Turn Data into Opportunity. Retrieved from https://powerbi.microsoft.com/en-us/.

Murray, B. (2019, August 27). SKB, [Hedge Fund Company] Pocket $67M in Northern California.

Page 62: Transaction Validation and Analysis

52

Owler. (2020). The firm’s Competitors, Revenue, Number of Employees, Funding and Acquisitions.

Pandas. (2019, November 9). pandas: powerful Python data analysis toolkit. Pandas, 18 Jan. 2019,

https://pandas.pydata.org/pandas-docs/version/0.25/.

Python Software Foundation. (2020). Python 3.7.6 Documentation. Retrieved from

https://docs.python.org/3.7/.

Radack, S. (2009, April 01). The System Development Lifecycle (SDLC). Retrieved from

https://csrc.nist.gov/CSRC/media/Publications/Shared/documents/itl-bulletin/itlbul2009-04.pdf.

Rubin, K. S. (2013). Essential Scrum: a practical guide to the most popular agile process. Upper Saddle

River, NJ: Addison-Wesley.

Securities and Exchange Commission. (2012, October 3). Investor Bulletin: Hedge Funds. Retrieved from

https://www.investor.gov/additional-resources/news-acuritilerts/alerts-bulletins/investor-bulletin-

hedge-funds.

Page 63: Transaction Validation and Analysis

53

Appendix

APPENDIX A: User Stories

Key

Epics

1 - Improve Winners Losers Report Generator

2 - Azure Validation Dashboard

Epic 1 Themes

1 – Update Documentation

Epic 2 Themes

1 - Validate Data

2 - Present Interactive Raw Data

3 - Generate Performance Commentary

4 - Integrate Datalake into PowerBI

5 - Design User Experience

6 - Write Documentation

User Story Theme Epic Story Points Sprint

As a firm analyst, I want to add columns in the Excel template, so that I don't have to manually edit the report.

1 1 16 1

As a firm analyst, I want to populate the modified template with data corresponding to the column names, so that I don't have to manually input data into the report.

1 1 24 1

As a firm analyst, I want to delete columns in the Excel template, so that I don't have to manually edit the report.

1 1 16 1

As a firm analyst, I want to modify columns in the Excel template, so that I don't have to manually edit the report.

1 1 24 1

As a firm employee, I want to learn how to use the report system, so that accounting can manually produce reports.

1 1 24 1

As a firm employee, I want to be guided through the use of the win-loss reporting system so that I can change the output of the report.

1 1 12 1

Page 64: Transaction Validation and Analysis

54

As a firm developer I want a Video Tutorial to Guide me through adding a column to the Win-Loss Report so that I can use the report generator more effectively.

1 1 3 2

As a firm analyst I want to be able to choose variable months for my diff report so that I can validate any month pair in the database.

1 2 24 2

As a firm trainee, I want to find the difference in IRR over 1 month, so that I can learn how to use DataBricks and interact with the DataLake.

1 2 12 2

As a firm analyst I want to know if a Strategy switched from being a gain to a loss or vice versa so I can recognize performance changes which affect the overall fund.

3 2 12 2

As a trainee I want to be able to get basic information from monthly data so that I can decide what is valuable to include in the report.

3 2 12 2

As a firm analyst, I want to see the biggest month over month change in IRR at the strategy code level over the last N months, so I can make an informed decision about investing.

1 2 4 2

As a firm accountant I want to know if remMV values changed when there was no trading activity so that I can check if the data is correct.

1 2 12 2

As a firm analyst I want to know if any terminal values went to 0 over the last month so I am aware of any closed positions in a fund.

1 2 12 2

As a firm analyst, I want to see the biggest month over month change in MOIC at the strategy code level over the last N months, so I can make an informed decision about investing.

1 2 4 2

As a firm analyst, I want to see the biggest month over month change in GrossProfit at the strategy code level over the last N months, so I can make an informed decision about investing.

1 2 4 2

As a firm analyst, I want to know the "Buy and Sell" transactions over the last month, so I can make an informed decision about investing.

1 2 8 2

As a firm accountant I want to know if any terminal values changed when there was no Buying or Selling activity because this is indicative of incorrect Data copy.

1 2 36 2

As a firm accountant, I want to be able to see the largest difference in IRR between two months so that I do not have to manually find it.

3 2 12 3

As a firm accountant, I want to be able to see the largest difference in Gross Profit between two months so that I do not have to manually calculate it.

3 2 12 3

As a firm accountant, I want to know which Strat codes changed from ongoing to monetized from one month to

3 2 8 3

Page 65: Transaction Validation and Analysis

55

the next to understand which stratcodes affect overall fund performance.

As a firm dev I want to know if the total terminal value is zero because if it is it should be monetized.

1 2 10 3

As an accountant I want to be able to drill down in the raw files so that I can see where the data may be incorrect.

2 2 18 3

As a firm analyst I want to filter down to specific funds so that I can perform more accurate validation checks.

2 2 6 3

As a developer I want to figure out how this report to connect dataframes from Databricks to PowerBI and create tables out of dataframes so that I don't have to manually create a report.

4 2 48 3

As a firm accountant I want to see data points that are outside a number of standard deviations from what are normal so that I can identify extraneous data.

1 2 24 3

As a firm accountant, I want to see missing data (IRR, MOIC, GrossProfit, Total_Cost, Total_Sales, Total_Terminal_Value) in irr_results and irr_mod_cashflows for a certain month, so that I fix them.

1 2 24 3

As a firm trainee I want to understand which analysis are performed on what tables .

1 2 4 4

As an accountant I want this report to be easy to use so that I can accurately check company reporting.

3 2 12 4

As an accountant I want to take the sum of inflow values in cashflow, so I can use that to further analyze cashflow data.

3 2 4 4

As an accountant I want to take the sum of outflow values in cashflow, so I can use that to further analyze cashflow data.

3 2 4 4

As an accountant I want to take the sum of total terminal values in cashflow, so I can use that to further analyze cashflow data.

3 2 4 4

As an accountant I want to be able to drill down from an alert to the specific information in the results or cashflows table.

4 2 12 4

As a firm accountant, I want to know if a strategy in a fund is ongoing and if it has quantity or accrued interest, so that I can determine why there is a notable change in the data.

1 2 12 4

As a firm accountant, I want to know when there is a break in the time series for strategies, so that I can determine why there is a break.

1 2 12 4

As a firm accountant, I want to see a line graph showing the changes in price for a given SyCode, so that I can predict where it might go next.

1 2 4 4

As a firm accountant, I want to know the average change over any time period for MOIC, Gross Profit, Total Cost,

3 2 48 4

Page 66: Transaction Validation and Analysis

56

Total Sales, after specfying a strategy so that I can understand changes in Strategies over time.

As a firm accountant, I want to see the biggest month over month change in TotalCost at the strategy code level, so I can make an informed decision about investing.

3 2 12 4

As a firm accountant, I want to see the biggest month over month change in TotalSales at the strategy code level, so I can make an informed decision about investing.

3 2 12 4

As an accountant I want to be able to have a report that updates automatically so I always have the most up to date information.

4 2 8 4

As a firm accountant, I want to see if strategies in funds with end dates are monetized, so that I can determine why.

1 2 4 4

As a firm accountant, I want to know if a strategy in a fund is monetized and whether is has no quantity and no market value, so that I can determine why there is a notable change in the data.

1 2 8 4

As an accountant I want to see the biggest sycode move for any strat code so I can further analyze that strat code.

3 2 4 4

As an accountant I want to see when MOIC and IRR are moving in opposite directions so I can further analyze the story associated with it.

1 2 12 4

As an accountant I want to know when GP does not change and there are many transactions.

1 2 8 4

As an accountant I want to know when RemMV changes and there are many transactions.

1 2 8 4

As an accountant I want to know the month to month price changes for a sycode, so that I can see the biggest moves in sycode price.

1 2 8 4

As a firm accountant, I want to check if there are begin dates for strategies, so that I can see why there might be none.

1 2 6 4

As an accountant I want to see if a monetized portfolio has a terminal value OR RemMV which changes from 0 to any number.

1 2 12 4

As a firm accountant, I want to see if strategies in funds with a terminal value of 0 are monetized, so that I can determine why.

1 2 6 4

As a firm accountant, I want to see which strategies are new, so that I can determine which strategies do not have previous data.

1 2 4 4

As a firm accountant, I want to see if a sycode belongs to multiple strategies, so that I can determine how to override the data.

1 2 2 4

Page 67: Transaction Validation and Analysis

57

As a firm accountant, I want to see whether prices for sycodes changes across funds, so that I can see if there were inconsistencies in the data.

1 2 6 4

As an accountant I want to see human readable commentary on which strategy codes influenced the portfolio, moved the most.

3 2 12 5

As a firm accountant, I only want to see BKRT, so that I can make decisions on a more meaningful dataset.

3 2 2 5

As a firm accountant, I want to stratify the data by Region, so I can gain insights about the progress of each region.

1 2 18 5

As an MQP student, I want to structure the final paper so it accurately describes our work at the firm, and so it is not based solely on our proposal.

6 2 6 5

As a firm accountant, I want to drill through alerts, so that I can prove that an alert is valid.

1 2 60 5

As a firm accountant I want to be able to use filters on every page for common fields such as portfolio, Business Unit, Strategy, Region, Sycode, ALERT ATTRIBUTE so that I can universally filter displayed data.

1 2 2 5

As a firm accountant, I want to see projected values based on historical data like standard deviation and linear regression, so that I can determine if my numbers are in a reasonable range.

3 2 20 5

As a firm accountant I want to see the Alert Description without the Extra linked column Visible.

1 2 4 5

As an MQP team member, I want to refresh and update our paper-omit previous technologies used and wrote about the new technologies used.

6 2 4 5

As a firm accountant I want to be able to see The DealName as well as StrategyCode Because I know deal names better than Stratcode.

5 2 2 5

As a firm accountant when Viewing the GP No-Change Transactions Rules I want to be able to drill down to transactions.

1 2 12 5

As a firm accountant I want Extreme IRR Values to be Filtered Out (Perhaps greater than 1000) Before the Standard Deviation Is calculated So that I only see useful Data.

1 2 3 5

As a firm trainee I want to meet with Users of the dashboard to better understand their needs.

5 2 6 5

As a firm accountant I want to Disable Grand totals on non-Applicable Fields.

5 2 5 6

AS a firm accountant I want want the ToolTip on Diffs to show the two values used to calculate the DIFF.

5 2 8 6

Page 68: Transaction Validation and Analysis

58

As a project sponsor, I want to see when a MOIC and IRR are different in my own terms, so that I can be alerted when it happens.

1 2 4 6

As a firm accountant I want the Closed_fund_transactions field to be renamed as the monetized_stratcode_with_transactions and to only check for non-null values in the RESID Column.

1 2 4 6

As an accountant, I want to see DealName and StrategyRegionofRisk as columns and as filters so I can effectively analyze the data and utilize the PowerBI Dashboard.

2 2 4 6

As an accountant, I want to be able to filter on portfolio (including ALL), so that I can assess strategies on a general level.

1 2 24 6

As an accountant I want to see the Absolute value of all Diffs so that I can sort them.

5 2 12 6

As an accountant I want to see relevant usable filters on each report page so that I can filter the data appropriately.

2 2 4 6

As an accountant, I want to just see values where the alert is true, so I see data respective to that alert.

5 2 4 6

As an accountant I want a Top Level Summary page that contains the data and a well organized way to access alerts.

5 2 24 6

As a developer, I want to learn how to properly use the slicer to arrange data to the accountant's satisfaction.

5 2 3 6

As an accountant, I want to see a flat list of transitions: raw data.

2 2 3 6

As a user of the PowerBI dashboard, I want the columns names to be to be easier to understand, so I can better understand the data is represented.

5 2 20 6

As an accountant, I want to see a description of each page, so I understand how to use the data provided and further understand the alert and its check.

2 2 12 6

As a firm developer I want to see commented Code so that I can maintain the software.

6 2 3 6

As a firm developer I want to add "changes" to IRR, MOIC, Buttons so that the RAW explorer is more useable.

2 2 1 6

As a firm accountant I want to filter the entire Report by Investement type so that I do not see irrelivant cash transactions.

1 2 2 6

As a project sponsor I want the headers of each page to be the same on every page so that there is consistency in design.

1 2 3 6

As an accountant I want to see commentary for all all return periods 1, 3, 5 year to date.

1 2 12 6

Page 69: Transaction Validation and Analysis

59

As a project sponsor for each dealname I want to see the Average Min Max LR for IRR, MOIC.

1 2 24 6

As a project Sponsor I want to see a graph of time series data with IRR GP MOIC all in one visual.

2 2 1 6

As a project sponsor I want to see a Descending sort of a diff between current value and average Value.

2 2 8 6

As an accountant, I want to the commentary to be split up into sections based on ReturnPeriod, so that I can easily digest the commentary section.

5 2 24 6

As a firm accountant I do not want to see total terminal value transaction types on gross profit same but transaction exists page because these types are not relevant.

1 2 1 6

Page 70: Transaction Validation and Analysis

60

APPENDIX B: Project Risks Per Sprint Sprint 2

Sprint 3

Sprint 4

Sprint 5

Sprint 6

Page 71: Transaction Validation and Analysis

61

Page 72: Transaction Validation and Analysis

62

APPENDIX C: Interview 1 with Firm Accountants Firm Accountant Meeting 11.11.2019

Attendees: Firm Accountant 1, Firm Accountant 2, Manasi Danke, Ethan Merrill, Joseph Yuen

Objective: Ask accountants about validation procedure and priority of checks

Introduction

1. WPI Project Description

Questions:

1. How do accountants validate cashflows? - Process Overview

a. Use a series of excel sheets that check for certain behavior

b. Accountants demoed their excel sheets

c. Commentary is used for marketing purposes and explains why certain behavior

happened

d. Evan sent us the steps and excel sheets that go through the validation process

2. What do you check first? What is the priority of different validation techniques?

a. Compare cashflows and profit & loss

b. Evan – new Strategy codes

c. Doug – inverse change in IRR and MOIC

i. IRR – time based – cash weighted return

ii. MOIC – total return type metric

d. Zombie Strategy codes

Page 73: Transaction Validation and Analysis

63

APPENDIX D: Interview 2 with Accountants Firm Accountant Meeting 11.21.2019

Attendees: Firm Accountant 1, Firm Accountant 3, Manasi Danke, Ethan Merrill, Joseph Yuen

Objective: Gather feedback on PowerBI Dashboard V0.3

Reactions:

Overview – No feedback

Explorer

• Show greatest movements

• Ability to filter on Strategy code

• Add Deal Name

• View change in GrossProfit

o Should equal PNL for the month

Alerts

• Show new Strategy codes

• Show MOIC negative changes

• Add drillthrough functionality

Statistics – No feedback

Commentary

• Add month over month values

• Useful

Time Series

• Add region

o Be able to filter on region

• Add report filter wide for region

Forecast

• Risk team already handles projections

Post Demo Questions:

• What do you like about the dashboard?

o Liked drilldown ability to see Strategy code level on a deal by deal basis

• What could be improved?

o We want to see Strategy code gross profit over time

• Could you see yourself or your department using this? Please explain?

o Drilldown ability in explorer could be useful

Page 74: Transaction Validation and Analysis

64

APPENDIX E: Financial Terminology

Asset Valuation To understand how an investment has performed, first the actual book value of the asset itself must be

determined. For publicly traded assets, the Net Asset Value (NAV) is used. The formula for Net Asset

Value is (Assets-Liabilities)/# of outstanding shares. NAV is commonly used to determine the value of

assets before any additional fees are charged by the brokerage or other entities in the trading pipeline.

Hedge funds such as the firm often have investments in less liquid assets such as real estate. Valuing

these assets is more difficult may not be performed more than twice per year because the assets are

illiquid, their value does not change rapidly, and they are difficult to value.

Internal Rate of Return (IRR) Internal Rate of Return is a percent measure of growth of an investment. More specifically, IRR

measures the annual compounded rate of return for an investment. This metric is commonly used to

determine the potential rate of return on future projects, however it can be used for existing projects or

investments as well. The internal Rate of Return can be used to calculate the Net Present Value in the

following formula (Investopedia):

Multiple of Invested Capital (MOIC) This metric is the return divided by the original invested capital. For instance, a 10x MOIC Could be the

result of a 1$ investment that returned 10$ or a one million dollar investment that returned 10 million.

Gross Profit

In business accounting gross profit Is calculated by subtracting the cost of goods sold from revenue. In

Investing, Gross Profit on an investment is the cash amount that an investment has appreciated since

conceived.

Remaining Market Value (remMV) Remaining Market Value is the total value of the investment, Strategy portfolio, or fund at the end of

the accounting period.

Total Cost

Total cost is the cash amount expensed in order to acquire the asset.

Total Sales

Total sales is the cash value of an asset that was sold during a given transaction.

Page 75: Transaction Validation and Analysis

65

Total Terminal Value

Total Terminal Value is very similar to remMV however it is calculated for every transaction, not just at

the end of the reporting period.

Return Period

The return period is the length over which the return is calculated. Most common intervals for the

return period are year to date, inception to date, one years, three years, and five years.

APPENDIX F: Site Map

Page 76: Transaction Validation and Analysis

66

APPENDIX G: Site Structure Diagram

Page 77: Transaction Validation and Analysis

67