Categories
Engineering

Building the Watcher Serverless app on AWS

The application

Watcher is a simple productivity tool to check for changes on websites. Create individual watchers by setting the location (URL) for the page and path (XPath) for the content to monitor. The application then checks for any change in the content every 6 hours. It is an API first and API only application with 4 endpoints: create a watcher, list watchers, delete a watcher, test run a watcher.

Source code and deployment instructions for AWS are available on GitHub:designfultech/watcher-aws

Planning

I implemented the same application in Python running on Google App Engine, which gave a good baseline for comparison.

I was looking to build on a new architecture in a new environment. The application was going to be built on AWS in Node JS using as much of the AWS Serverless capabilities as possible.

AWS is in many ways the front-runner of Serverless and their Lambda service is one of the leading Function as a Service (FaaS) engine.
Node.js is a frequent choice of language for FaaS implementations and I wanted to see how it would compare to Python.
I was keen to use as many of the readily available functionality and services as possible, and write as little infrastructure code as possible (not to be confused with infrastructure as code).

Getting started

Serverless is full of infrastructure configurations, also known as infrastructure as code. It was a decision from the beginning that everything goes into code, no (Web) console monkey business in the final result. Wiring up and configuring the services in code (YAML) has a bit of a learning curve, and comes with a few surprises – more about these later. Production ready code requires a good amount of DevOps skills, there is no way around it with this architecture.

Other than picking the right language and suitable libraries there is not much else for the main logic. As for the infrastructure code, there are far too many options available.

Approach to infrastructure as code

The choice was to go with SAM and drop into CloudFormation where necessary.
Ultimately all the solutions will have a straight line down to the platform SDK.
The main question with 3rd party solutions is, how well and close they can follow platform changes? Do they support the entire platform and all the attributes needed?

Operating a platform in actual code lacks abstraction and too verbose.
Native platform scripts and templates offer the right level – close enough to the platform SDK for fine grain control, but well above in abstraction to be manageable. SAM as a bonus can raise the level of abstraction further to make life even easier. If there is a future for infrastructure as code it must be to do more with less code.

Other projects with different conditions may find the alternative solutions more suitable.

  • A quick prototype probably makes better use of a toolkit like Serverless Framework.
  • A large scale, multi-provider cloud environment could use a general purpose engine like Terraform.
  • An adaptive environment with lots of logic in provisioning resources would need a coding approach like Troposphere.

Designing the application

Given the service oriented nature of Serverless and FaaS, a service definition seems to be a good start. Swagger (OpenAPI) is a good specification and has good platform support on AWS and in general. Another reason for choosing SAM templates was the reasonable level of support for Swagger.

Swagger specifics

  • SAM does not handle multi-part Swagger files at the time of writing this article, which would be an issue for a larger project. One solution could be to use a pre-processor, something like Emrichen which looks very promising.
  • Unfortunately the platform specifics and infrastructure as code leaks into the API definition, there is no easy and clean way around it. Therefore the Swagger definition is peppered with x-amazon- tags in support of the AWS platform and SAM. Perhaps the right pre-processor could merge external, platform-specific definitions in an additional step during deployment.

Platform services

Here is a high-level diagram of the AWS platform services in use in the application. Lot more relevant bits could go onto this diagram that are not trivial at first – more about these later.

API Gateway was a given for reaching the REST API-s in the application.
DynamoDB can easily meet the database needs. It is also easy to expose CRUD + Query functions via the API Gateway.
Lambda was the primary candidate for the two services doing most of the work for the application.
CloudWatch has a scheduler function that comes in handy for triggering the regular checks.
Fanning out the individual checks can be done using a queue mechanism, SQS does the job well.
Finally, notification e-mails go out via SES.

Cross-cutting concerns

This project was specifically avoiding getting into cross-cutting concerns such as application configuration, security, logging, monitoring. They will be explored in another project.

Application configuration uses environment variables set at deployment time.

Security uses simple API key authentication. The implementation uses IAM’s roles and policies to authorise access, but it does not follow any recognisable model or method at this time.

Logging and monitoring was not explored in any detail. The first impressions of the vanilla logging facility were poor. The lack of out of box aggregated logging is a big surprise.

Building the application

Database (as a Service)

DynamoDB is a versatile key-value store with built-in REST endpoints for manipulating the data. I was keen to make use of the built-in capabilities and see how far they stretch. While DynamoDB does the job for this project, it will be increasingly more laborious to use for even slightly more complex jobs.

My very first task was to hook up DynamoDB operations to the API Gateway and be able to create new items (records), list them, and delete them without involving Lambda. It is all configurable in Swagger with relative ease.

Observations

  • I could not return the newly created item in the response to the client. There is no support for ALL_NEW or UPDATED_NEW in ReturnValues for the putItemoperation, so now the API returns an empty object with a 201 status.
  • Mapping templates (request and/or response) in Swagger look like code in the wrong place.
  • Mapping actually uses another template language – Velocity. An immediate limitation I hit was trying to add a timestamp to the created_at attribute. There is no way to insert an ISO8601 UTC timestamp. The solution was to use $context.requestTime, which is CLF-formatted. The only other alternative would be Epoch.

AWS Lambda in JavaScript (Node)

There are two functions implemented on Amazon Lambda:

  1. Checking a single item, web page, for any change.
  2. Periodically getting the list of watcher items and launching individual checks.

The code is fairly simple and self-explanatory for both functions, see comments in source code for more detail on GitHub.

Coding Lambda functions in Node/JavaScript was a breeze. Adopting the function*() yield combo in the programming model makes it very easy, sync-like, to code in this inherently async environment. Defining the functions in SAM is straight-forward.

I managed to focus on coding the application features without much distraction from infrastructure code. The one task that did not work well was manipulating data in DynamoDB. AWS SDK is too low level, it would not be efficient to use on a larger, more complex project. Next time I would definitely look for a decent ORM library, an abstraction on top of the SDK, something like Dynamoose.

The AWS SDK is bundled with the Lambda runtime. It does not need to be a dependency in the package.json, it can be a development dependency only.

Application integration

Every once in a while – set to 6 hours by default – one of the Lambda functions retrieves the list of watcher items to check for any change. There are a few different strategies to overcome the restrictions of Lambda function timeout, fanning out the check function seemed an easy and cost effective way to go. Each item from the list is passed to an SQS queue in a message, then picked by the target function.
A few considerations:

  • Messages are processed in batches of 1 to 10 depending on the configuration. This use case needed to process only 1 message at a time.
  • If the check fails for any reason, the message is put back into the queue and retried after a short while. Set a maximum number of deliveries on the message to avoid countless failed function calls and associated costs.
    Another control mechanism, for other use cases, is to set the expiration on undelivered messages.
  • Worth defining a dead letter queue where failed messages are delivered after all tries are exhausted or timed out.
  • Queues resources are not available in SAM, had to use CloudFormation AWS::SQS::Queue definitions.

The wide range of event sources supported by Lambda is very powerful. Be aware that event sources can have very different looking event data and determining the source from either context or event is not trivial.

One of the Functions – run-watcher – can be invoked via API Gateway and via SQS message. These two sources have entirely different message schemas. A small code piece is responsible for detecting the source, based on event specific attributes and parsing the message accordingly. In an ideal world, the AWS infrastructure would provide explicit information about the source of the different events.

SAM’s intricacies

The aforementioned scheduling of periodic checks was done using the AWS CloudWatch Timer service. This was easy enough to configure in SAM as an event on the Function.

SAM’s ability to create resources as necessary is a double-edge sword. While it significantly reduces the amount of infrastructure as code, it also introduces surprises to those who are not intimately familiar with all the platform specific resources of their application.

The scheduler resource is a good example to investigate. In SAM the scheduler event was defined in 4 lines as

Timer:
  Type: Schedule
  Properties:
    Schedule: rate(6 hours)

The CloudFormation equivalent would be something like this

WatcherScanScheduler:
  Type: AWS::Events::Rule
  Properties:
    Description: Schedule regular scans of watchers
    Name: WatcherScanScheduler
    ScheduleExpression: rate(6 hours)
    State: ENABLED
    Targets:
      -
        Arn: !GetAtt WatcherScanFunction.Arn
        Id: WatcherScanFunction

The SAM version is ultimately expanded to the same CloudFormation definition by the tool. Notice that in SAM there is no place for some of the properties found in the explicit CloudFormation definition. This is why many of the resources generated by SAM behind the scenes will show up with seemingly random names. A good summary of generated resources by SAM is on https://awslabs.github.io.

This behaviour is not limited to SAM, many 3rd party tools that use an abstraction on top of CloudFormation have the same effect.

Resources depend on other resources. In most cases dependencies are satisfied with inserting a reference to the resource using their name or their ARN.
When creating the individual resources part of a stack, some of the references may not resolve because the pertinent resource do not exist yet at the time. The resolution to this problem is to explicitly state dependencies in the template. The usage plan (AWS::ApiGateway::UsagePlan) in the SAM template is a good example to analyse.

WatcherApiUsagePlan:
  Type: AWS::ApiGateway::UsagePlan
  DependsOn: WatcherApiStage
  Properties:
    ApiStages:
    - ApiId: !Ref WatcherApi
      Stage: !Ref EnvironmentParameter
    Description: Private Usage Plan
    Quota:
      Limit: 500
      Period: MONTH
    Throttle:
      BurstLimit: 2
      RateLimit: 1
    UsagePlanName: PrivateUse

Trying to deploy the stack without the explicitly stated dependency DependsOn: WatcherApiStage will fail with not finding the Stage which is automagically generated by the SAM template together with the API (AWS::Serverless::Api) later.
This case is further complicated by SAM’s abstraction of API resources and hidingthe Stage resource definition. Where is the name WatcherApiStage coming from? It is generated by SAM as described at this link.

The SAM CLI has a validation function to run on the SAM template, and the Swagger API Definition. The current validator has little value, failing to catch even simple schema violations. The true validation of the template happens when CloudFormation attempts to build the stack. In the future, infrastructure as code must have the same level of syntax and semantic check support as other programming languages.

Cross-over between infrastructure as code and application code

Occasionally the the application code must share information with the infrastructure as code. For example: the ARN of the queue where the application sends messages to. In this case, SAM has an environment variable for the Function that picks up the queue ARN using a utility function (GetAtt). The environment variable is read during execution to get the queue ARN for sending messages. This works well in most cases where infrastructure definitions are made in one place only and picked up during deployment, and the references do not change after deployment.

Setting the region for SES is a similar case, but warrants a short explanation. SES is only available in 3 specific regions and the application must use one of those for sending e-mails even it is deployed elsewhere. There are no SES resources (AWS::SES::…) defined in this project, yet the SES region is defined in SAM as an environment variable. It could have been a configuration in application code, but since SES is an infrastructure element, the region configuration is best placed in SAM.

Summary

The application works, it does what it is supposed to. There were some trade-offs made and there are areas for improvement. But most importantly the architecture and the platform has huge potential.

Building the application was easy enough. Was it as easy as on Google App Engine? That is for the next time.

Categories
Engineering

The Search for a New Stack

The Google App Engine Standard Environment and the ecosystem around it has provided a comfortable and solid software stack in the past.

There is a general move towards standalone services, such is the case with Tasks and Scheduler.

  • New ones were introduced (FirestoreMemorystore),
  • some of them were deprecated (mail, Channel API),
  • while the fate of a few are unknown (memcache, search).

There were a mix of changes on the (Standard) platform:

  • Version 2 runtimes, auto-scaling containers (gVisor)
  • Move to REST API-s, away from the proprietary Google API.

These are generally welcoming changes, and while the platform is in a state of flux currently, some of these changes have removed the comfort and ease of building software for the platform.

The search begins

Writing infrastructure code has eaten into IT projects for a long time. It causes all sorts of project slippages, budget overruns, and enormous technical debt.

Developing your own server – web, messaging, socket, even database – or building a new web application framework, or client library are all “writing infrastructure code”.

Therefore it is imperative that coding focuses on business capabilities and value with the new stack.

It also means we can do away with the server infrastructure – servers, VMs, containers. While we are at it, it should remove the associated scalability and availability concerns too.

Costs should align with business value, and include effective use only, for example: computing time (no idle), messages sent, users registered, data stored, files transferred, jobs scheduled, and so on.

The stack has to sit on a platform with a rich ecosystem – a good range of core services for example:

  • database,
  • object storage,
  • messaging,
  • identity management.

Some of the best in class capabilities could live outside of the platform, for example:

  • payments,
  • bulk and transactional emails.

The stack must have the facilities to gain insight into its inner workings. Support for variable levels of centralised logging from hosted services and from logging API in bespoke code.

Support for monitoring of platform and application services, alerting on specified issues and metrics.

There must adequate tooling support for the entire application lifecycle extended to

  • environments, including local development,
  • programming languages, including relevant tool-chains.

A worthy candidate

The emerging paradigm, Function as a Service (FaaS) together with Serverless, promises to meet my expectations. It would have to be part of a comprehensive cloud offering that delivers the rest of the stack.

FaaS/Serverless reinforces cloud native concepts, and that leaves a lot of questions open for this new stack.

There is still the Google App Engine Standard 2nd Generation with the rest of the services from the Google Cloud Platform. If it wasn’t for the challenges that prompted the search for a new stack, it would be one of the best options available. Perhaps it will be again some day. In the meantime, is there something better?

Is it going to deliver on the benefits? What are the trade-offs going to be? Let’s see.

Categories
Fintech

Robo-Advisors: Digital Transformation in Investment Management

When hearing the expression robo-advisor, you might be thinking of Siri, or Alexa, or one of their robo-siblings giving you advice about the next big investment opportunity. That’s not what these robots are about, at least not yet. Rather than giving advice, at the moment robo-advisors are nothing more than automated portfolio managers. These digital investment services with minimal human intervention are aimed at investors who want inexpensive savings plans and a convenient way to manage the investment choices.

Digital Transformation in Financial Services

Meet Tim, head of product and one of the founders at a rapidly growing FinTech startup. He started by telling us about the early days of their company.

…algorithms have come a long way and together with new technologies they have transformed everyday activities in ways that allow cars to assist people with driving, make recommendations for music to listen to, news to read, even match people for a date. Recent disruptions to the financial services sector target how we make payments, arrange lending, and consume advisory services.

Digital transformation has been upon the financial services sector for some time. Tim’s company discovered that the lack of innovation in parts of the sector made it an ideal target for disruption.

We realized that a segment of advisory services was ripe for automation, that’s how we ended up in wealth management. The business processes for these services are simple enough to allow automation of them to a great degree — provided that the needs of our customers fall within a range with regard to acceptable risk, investment choices, and mode of interaction.

He continues:

Until recently, the two options for investing were self-invest (DIY) or the involvement of asset managers for fee. Recent studies about customer satisfaction also revealed that financial services ranks at the bottom of the industry sector list. Many customers had bad experiences with advisors, which prompted the question: would they trust a machine more than a human? We saw an opportunity to offer a third option: automated investment services. The technology was already available to us to build our digital product – smart algorithms backed by deep learning, interconnected digital ecosystems, cloud hosted platforms.

Tim explains how they differentiated their product in this space:

When we set out to define our product and services we specifically focused on making them to…

Be easily accessible – starting with web access, and increasingly putting more and more effort into the mobile channel.

Be more economical by reducing costs significantly, and making them accessible to a larger customer base.

Be more transparent about pricing, risks, and really all aspects of the services we offer.

The time was also right for tapping into a new breed of customers, who are increasingly digitally literate and open to exploring new channels and ways of interacting not only with their peers but also with businesses.

Financial services is a prime target for agile, digital companies eager to transform the customer experience and automate operations. Tim’s company had to move fast:

We wanted to launch quickly because of the fierce competition between providers of financial services fueled by the latest technological innovations. We have launched our robo-advisor product to a group of early (beta) customers in 6 months.

What is the market for robo-advisors?

Ana is a twenty-something graduate from a top school, currently working at one of the multi-national firms. She is a typical millennial in many ways – enjoys hanging out on social media, and updates friends using a variety of messaging apps. Time is important for her, she hates queuing in line, gets food delivered to her, and works strictly office hours.

Although she has a lot of time for friends, followers, and listens to influencers, she first heard about investing from her senior colleagues during a conversation. It was not long before she came across Tim’s company offering investment products to young folks like her.

She commented:

The ease of use was a big plus. It completely demystified complex financial products and services.

I liked that I could invest small amounts of money. I can only afford to save a few hundred per month at this time.

I also liked the personalized services made available right from the app, and the fact that I did not have to go to some branch, queue, deal with paperwork, and engage in boring small talk just to open an account.

Customers of recent generations, in general, are more comfortable with interacting with digital services than previous ones. The digital customer journey is, and will be, a differentiator amongst robo-advisor providers. Not only for making customer acquisition more efficient, but also for retaining customers for the long term. This tendency allows services in general to replace humans with automated tasks, and direct interactions to purely digital channels such as web, chat-bots, apps, and others.

Tim’s reflection on the market:

Millennials are increasingly looking for ways to manage their money, with cheap investment options, and economies of scale, our robo-advisor services are able to fill a gap for this customer segment.

There is also increasing evidence suggesting that actively managed funds, and their human managers, are unable to produce the returns that plain index funds can deliver in the long term. This gives us the opportunity to look at new customer segments and also engage with Baby Boomers at or approaching retirement, and High Net Worth Individuals for our new services.

Investment advice is inherently complicated for most people — anything associated with financials tends to be complicated naturally. Robo-advisors allow a set it and forget it approach to investment, putting it on auto-pilot that performs best in the long term.

Tim also admits, it has not been a smooth ride for the company.

While the technology is in place, and the market together with demand is seemingly there, the marketing efforts and costs involved in turning our robo-advisor into the product we wanted to offer was a grueling task. It is a service that competes on ease of use, and most importantly on price.

The rock-bottom pricing means that we must seek to profit from the economies of scale, and long-term customer loyalty. Achieving a higher customer lifetime value over customer acquisition cost in this environment is not a trivial task, and puts our marketing and sales departments on their toes.

At the same time, technological advances have created an environment that makes product innovation, development, and launch increasingly rapid. Digitization of capabilities and processes have allowed the introduction of new products and business models, and lowered the barrier for entry for many new businesses. The marketing challenges and increasing competition put Tim’s company under pressure.

As soon as the first version of our product was out, we started working on the next iteration. We had major plans, and an ambitious roadmap.

A Day in the Life of a robo-advisor

Two years after those first beta users at Tim’s company, multiple robo-advisor platforms sprung up sharing the same mode of operation:

  1. Customer acquisition
  2. Customer assessment, investment portfolio selection or construction
  3. Continuous portfolio monitoring and rebalancing

Tim explains:

Customer acquisition is a critical step in the process although not tightly related to the advisory activities. As with many digital offerings, the marketing and selling of the service is just as important as providing a good service itself.

It is the customer assessment, investment portfolio construction, and rebalancing activities where our smart algorithms take center stage and differentiate us from others. The assessment is based on a customer questionnaire with the purpose of homing in on the critical parameters of the most suitable investment portfolio.

Modern Portfolio Theory provides the foundation for portfolio construction, informed by the goals and parameters — such as appetite for risk or expected returns, investment duration, amount of money invested, and so on.

For most services, the universe (a set of securities with common features) for selecting investment from is limited to ETFs. For customers with larger investable amount on their account, we also offer the option to allocate a portion of the investment to trading individual shares from the stock market. Our system regularly rebalances the portfolio, based on our proprietary strategy, to maximize the performance considering current market trends and industry performances.

Naturally, we also send regular updates to our customers about the performance of their investment, and inform them about our efforts to better manage their assets.

Although nobody should expect to become a financial guru from using robo-advisors, user experience plays a critical role in eliciting a more thoughtful customer behavior for managing finances. People tend to overestimate their tolerance for risk when it comes to their investments. Robo-advisors allow for more guided and informed decisions on investments, and protection of assets from unwanted risk.

Transforming an industry is not an easy task

Let us not forget that many of the industries in existence today, including the once from the financial services sector, have been around for a long time. In comparison, the Digital Transformation movement is in its infancy. Bringing together people, process, and technology to make business successful remains a challenging task – a statements Tim agrees with.

While most robo-advisor services, including ours, are just front ends to existing financial systems, the effort going into providing an excellent user experience to our customers on desktop and increasingly on mobile channels cannot go unnoticed. Therefore, removing the worry from the process, offering a hugely valuable service, while hiding the complexity of the underlying financial systems is a feat that few of the robo-advisors are capable managing well.

The opportunity for robo-advisors comes from offering cheaper, superior investment advice, and reaching a massive market with the low-cost service. These services charge roughly an order of magnitude lower rates (0.1% instead of 1%) to manage individual portfolios. At these rates, client acquisition remains a question and a challenge for most robo-advisors.

We have asked Tim – how do you make sure the profitability of the customer grows beyond the costs of acquisition?

“Indeed, some of the services fell for the if you build, they will come fallacy, and ignored the significant effort required on the digital marketing side.

After recognizing the marketing challenges, we immediately started looking for new opportunities to turn profit. We set out two paths in our strategy:

  • Partnering with established financial firms with wealth management arms, which could allow us to tap into a more affluent segment of the market.
  • Discovering new channels and digital marketing platforms for reaching customers at a lower cost, experimenting with multiple social media channels.”

What is the reaction from the financial services industry?

Most recently, robo-advisor companies looked to move from direct to consumer (B2C) to partnering with existing advisors and adopting a business-to-business (B2B), or B2B2C model. The immediate opportunity was in leveraging automated advisory services and augmenting human advisors.

Rob who works at one of the largest bank’s wealth management arm explains:

The emergence of ETFs and index trackers gave rise to a simpler and lot more economic investment option: passively managed investments, as opposed to actively managed funds and portfolios involving human fund managers.

At the same time, robo-advisors have transformed the way investments portfolios are constructed and managed. Robo-advisors have the additional benefit of simplifying, even removing the complexities of tax implications, investment methodology, and philosophy choices.

We have observed that some people have difficulties entrusting a computer to manage their investment wholly, something that lacks personality, empathy, and human intuition.

We believe that the emerging trend of hybrid robo-advisors provides a solution by mixing automated and traditional human services. With the hybrid model, human advisors can focus on what they do best — consulting the client about their financial position, and needs on a more personal level — and scale their services at the same time.

Until the underlying products and technologies for automated investment advisor services develop further, human advisors will have the benefit of leveraging current automation services, and augmenting their own offerings.

Rob’s bank has tremendous interest in robo-advisors and Rob is leading the effort internally. His team recently started working with Tim’s company.

We have looked at a variety of products and features available on the market at the time, and found a wide spectrum across the board.

The summary some of the findings from the bank’s research analysis focused on the limitations and differences among the automated investment advisory services:

  • The level of personalization provided by the robo-advisors varies widely, but in general remains limited. A degree of personalization comes from the variety of niches available in this space aimed at women, students, or Islamic investors for example.
  • Most services select pre-constructed portfolios based on a limited set of outcomes from the customer assessment.
  • Investment strategies and features such as tax optimization, tax harvesting, and rebalancing are not available to all services, or to all customer segments.
  • Tax-loss harvesting, for example, requires buying individual shares rather than ETFs in order to account for losses in individual shares.
  • User experience, from onboarding to regular updates, is a huge differentiator among the businesses.
  • In a similar fashion, channel strategy and digital marketing also set many of these services apart.
  • There is also a geographic differentiation of these services, driven by regional specifics such as state of economy, regulation, and customer behavior.
  • The countries with the most robo-advisor services at the time were: United States, United Kingdom, Canada, and Australia. Asian and South American countries were also represented in much smaller numbers.

Rob summarized the bank’s stance on the industry itself:

We have been pleasantly surprised and happy with the hybrid robo-advisor model. The partnership with Tim’s company allowed us to introduce a new set of products for our clients.

Automated investment advisory services made an impact on the wealth management industry and will continue to grow. We believe it will take a long time for robo-advisors to claim a significant portion of all assets under management. Until then, human advisors will continue to have a significant role in this industry.

What is the future for robo-advisor services?

We also asked Tim what he thought about the future directions for the industry.

Ultimately Robo-advisory services would be able to learn about the client’s’ financial needs and investment behavior to advise based on up to date individual characteristics.

Data collection and analysis will always have a tremendous role in collecting, connecting of relevant data, and in turning it into meaningful information for machine processing. Forming the ideal data set for the best advisory service will have to overcome privacy and security concerns, as well as quality issues for an ever-increasing mountain of data.

Future development stages of strong machine intelligence will be able to bring new levels of personalization to investment advisory. They will gradually incorporate qualitative (as opposed to quantitative) assessments in portfolio allocation, and will eventually develop an edge over their human counterparts. The research in artificial intelligence will fuel new developments in these fields, and increase the benefits to the clients of investment and other advisory businesses.

Final thoughts

ETF-s brought the democratization of financial services by their simplicity and low-cost structure. Social media platforms will further democratize the access and delivery of them, and the services based on such products. The combination of market trends and new technologies gave way to the emergence of robo-advisors. Advances in the fields of data analytics, and research in artificial intelligence will gradually remove current limitations, and services will eventually evolve beyond today’s best human advisors.

In the future, we can expect Alexa to wake us up to some good news about our financials.

Hello Dave, your portfolio is looking well today. Last night, I have reallocated some moderately performing investments into a higher risk/benefit asset in line with your preferences. Would you like to know more?

Categories
Engineering

Managing Enterprise Architecture using Semantic Wiki

The basic idea is to use Semantic Web technologies to support the Enterprise Architecture practice in the organisation. Semantic Web offers a set of capabilities that makes it ideal for this purpose.

What is available?

  • Semantic MediaWiki
    The current version (at the time of writing: 1.8.x) is a sufficiently mature implementation of semantic features on top of MediaWiki. There are a suite of extensions – most of them are part of the Semantic Bundle – available to extend the basic feature set.
  • Enterprise Architecture ontology
    The basis for enabling SMW to be used as an Enterprise Architecture (EA) tool is the EA ontology (call it meta-model, or schema). It defines a set of concepts and their relationships that will help to organise and structure the information gathered about the enterprise. One of these ontologies can be easily derived from the ArchiMate (V2) meta-model.
  • Triple storage for semantic inferencing and querying (SPARQL)
    Triple stores are a bit tricky in the current version of SMW. The main reason for using them is to take advantage of features like: symmetric properties and inverse properties, offering a lot of value when querying.

What else is needed?

  • Semantic Annotation
    Currently the weakest part of MediaWiki (not specific to SMW) at the moment is the editing, which is not helped by the additional markups for semantic annotation. DataWiki (formerly SMW+) a reasonable job with allowing semantic annotation on the wiki page in WYSIWYG mode. A decent annotation tool (perhaps porting DataWiki’s implementation) is needed to do this job better than the plain wiki markup in SMW.
    Note that most people solve this issue and prefer to use Semantic Forms to enter data. It is a good solution, but carries the danger of  constraining information capture and limiting use (virtually turning SMW into SharePoint or an Access Database).
    This will require new development and/or porting of existing code.
  • Spreadsheet integration
    The reality is that most organisations still use and prefer Excel as the de-facto standard for gathering “somewhat” structured information. It is still the most effective way to request records of data (for example: application names, descriptions, owners, etc) and review of data in many organisations.
    This will require new development.
  • Visualisation tool
    Most people prefer visual representation of landscape, design, etc. and the preferred tool in most places is Visio.This will require new development.
Categories
Engineering

Enterprise Architecture:The Common-Place-Book

The book I have just finished reading – Where Good Ideas Come From by Steven Johnson– had an interesting chapter on The Slow Hunch innovation pattern.

The part in there that really caught my attention was about the common-place-book. The following historical quote from John Mason in 1745 makes the case for organised (retrievable) thoughts:

Think is not enough to furnish this Store-house of the Mind good Thoughts, but lay them up there in Order, digested or ranged under proper Subjects or Classes. That whatever Subject you have Occasion to think or talk upon you may have recourse immediately to a good Thought, which you heretofore laid up there under that Subject. So that the very Mention of the Subject may bring the Thought to hand; by which means you will carry a regular Common Place-Book in your Memory.

In the same chapter, the historian Robert Darnton is quoted on re-organising texts into fragments and removing the linearity of the text:

Unlike modern readers, who follow the flow of narrative from beginning to end, early modern Englishmen read in fits and starts and jumped from book to book. They broke texts into fragments and assembled them into new patterns by transcribing them in different sections of their notebooks. Then they reread the copies and rearranged the patterns while adding more excerpts. Reading and writing were therefore inseparable activities. They belonged to a continuous effort to make sense of things, for the world was full of signs: you could read your way through it; and by keeping an account of your readings, you made a book of your own, one stamped with your personality.

Later in the Serendipity chapter – another pattern involving accidental connections – the author mentions DEVONthink, a tool to manage and organize all those disparate pieces of information. DEVONthink features a clever algorithm that detects a subtle semantic connections between distinct passages of text.

Categories
Engineering

Systems – an Ontology approach

The purpose of the Systems Ontology is to provide a framework, in the form of an ontology, for capturing system details resulting from systems analysis.

The Systems Ontology is composed of three distinct parts:

  • SystemThing holds the core concepts of a system – not to be changed
  • SystemSpace is a definition of categories applicable to different concepts describing specific systems – should change rarely as the understanding of the different types of systems evolves
  • System is the place for the instances of specific systems – may change regularly as the analysis and understanding of specific systems progresses.

The following diagram is a representation of the different elements in each part.


Core concepts and Spaces
Core concepts define the fundamental building blocks for the Systems Ontology in a set of abstract classes of mainly two types: Thing and Space.
Things describe the different aspects of a system on very high abstraction level (meta-meta in this case).
Spaces provide further refinement for the different aspects of a system.

Structure
The structure of the system is described in abstract sub-classes of StructuredThing. The structure defines the relationships between concepts, it defines what form (in terms of graphs) of construct can be built. Constructs can be for example: chain, ring, tree, net, a combination of these or any other formation.

Specific structures are defined inheriting from the StructuredThing abstract class, making StructuredThing a collection, a container of various types.

An example structure: ComposableThing has two slots, has_part and is_part_of (both are each other’s inverse), since has_part is a multi-value, is_part_of a single-value slot referring to instances, the structure for ComposableThing is going to be a tree.

Category
Categories are defined in the CategorySpace, which holds a hierarchy of categories underneath. CathegoryThing enables instances of a system elements to reference multiple categories from the CategorySpace.

CategorySpace is a bit unusual as far as modelling categories goes. Usually there is a meta-model for describing the characteristics of a category on the meta-model level then each category is an instance in the model often in some form of relationship with other categories.
In this case, a different, perhaps unusual approach is taken when categories are defined as classes and sub-classes of classes. This will allows to build a hierarchy of categories and use the class name for the name of the category, which should be sufficient as no attributes are necessary.
There is one more twist, instead of assigning a class (category) as a superclass to the categorized class, the CategorizedThing defines a slot with the value type of a class, in other words the category is defined as an attribute (slot).

This approach allows to define the categories in the meta-model in a hierarchy and still use them as an attribute in the model.

Life-cycle
The notation of life-cycle makes the concepts “alive”, in other words, it represents the time factor.

Individual life-cycles are captured in the LifeCycleSpace as sub-classes. Life-cycles consist of states, these are captured as instances of a life-cycle class in the ontology. LiveThing enables the system instances to reference individual states and other related instances – behaviour and quality (details in the next section).

Note that life-cycles are not described as proper state-machines in this ontology. There is no notation of events or state transitions other than registering the preceding and following states for each state instance.

Behavior and Quality
The following assumption was made: Only “living” things (ie concepts with life-cycle) show characteristics of behavior and qualities.
Based on this assumption, behaviour and quality details can be registered to any LivingThing.

Behavior and Quality are two concepts represented in the ontology in a similar fashion.
BehaviorSpace consists of sub-classes, where sub-classes are considered as categories of behavior. These categories can serve as a mechanism to group, qualify, arrange the long list of different behaviors a system may have.
Specific behaviors are captured as instances of a Class from the BehaviorSpace.

Qualities are represented the same way, where categories are sub-classes of the QualitySpace and specific qualities are instances of a class from the QualitySpace.

System instances
The root for capturing specific system instances is SystemSpace, it inherits a set of abstract classes (LivingThing and CategorizedThing) describing a system.

Besides creating instances for system elements a few other types of instances will be created along the way including: life-cycle states, behaviors and qualities.

States
Life-cycles consist of states, individual state instances should be created under the relevant life-cycle categories.
States are later referenced from system instances, where a system can only have one state at a time in the current setup of the ontology.

Behaviours and Qualities
Behaviours and Qualities are instances of simple classes in the pertinent categories. In the current ontology each instance is represented with a name (slot) only.
The categories should be carefully chosen for both set of concepts, otherwise can be time consuming to re-factor an instance of a quality into a category in order to register finer grain qualities. Same challenge applies to registering behaviour.

Topology and Systems

Systems, and sub-systems, are captured in some form of hierarchy in the ontology – the Topology. The Topology for systems is not pre-defined as it mostly depends on the method applied to systems analysis, and somewhat depends on the different types of systems as well.
Topology is withing the System ontology, it is captured together with the system instances, unlike the categories for life-cycle, behaviour and quality.

System instances are created within the topology, where they automatically inherit characteristics of LivingThing and CategorizedThing.
The topology does not have a notation of structure for system elements, therefore predefined (in the SystemSpace) structures should be applied as Super-classes to specific topology classes. Note that sub-classes in the topology will inherit structures from parent classes, for that reason structures should be used sparingly in classes closer to the root and should be applied to classes closer to the instances.

Data characteristics of system elements are defined within of the Topology by adding them to specific classes. Data details are essentially attributes (slots here) on a class. One should be careful making sure that data is not confused with structure. Slots with references to other system instances are perfectly valid data elements, however these can be easily confused with structure elements also represented as slots. The success of proper separation will depend on the rigour applied to systems analysis.

Categories
Engineering

Code Generation 2008: Day 3

Last day of CG2008.

Key note: The Domain Specific IDE by Steve Cook (Microsoft)
Steve has introduced DSLs briefly, talked about various patterns of use, then the main topic of software development using DSLs and the evolution of software development. He depicted SW evolution in quadrants and highlighted the path of evolution as: craftmanship > mass production > continuous improvement > mass customization.
A few highlights from the session

It makes sense to model concerns [IT Architecture].

[DSL, code generation] the only way to get some head-way is to make it economical.

To create a language [DSL], it really means to create a tool. The language itself is pretty useless in itself.

We have recognized that Visual Studio is a platform not just an IDE.

Bran Selic: Standardization [software development] goes way beyond standardization of API. One way to do it right is to standardize on semantics.

Experience Report: Evangelizing Code Generation: A case study of incremental adoption by Brooke Hamilton (FM Global)
Brooke’s session was pleasantly different than the other sessions during the conference as he was from the customer side and not a vendor or an academic. His presentation was brilliant, well structured, pleasantly constructed. Brooke had real details of his own and his team’s experience on their project(s) from the last few years. It was definitely one of the best presentations during the conference, it sparked a lot of good discussions.

Code Generation Narcosis – you feel more powerful using code generation than you really are.

Code generation improves architecture. It makes you look for redundant parts and separate concerns.

We cannot make an application serving unknown needs, but we can generate a lot of code! 

Versioning of the templates [code generation] is absolutely crucial.

Tutorial: Strategies for generating code from Microsoft DSL tools and T4 text templates by Brooke Hamilton (FM Global)
This session was more of a demo using the Microsoft DSL Tools in the Visual Studio environment. Brooke also tailored in some of his experience with the tool and how it is applicable to their projects at FM Global.

* * *

In summary, it was an excellent conference with a lot of great presentations from the top experts on this field, great discussions during panels, BoFs and Goldfish Bowls. A lot of new ideas, thoughts and questions to take away to consolidate and look for answers until next year’s event.