Abstraction is a recurring theme in my work. While it is second nature for some, most folks have too strong of a footing in concrete 😀
Everybody can do abstraction. It is a skill, it is a mental muscle that needs training. Applying it regularly to situations and using it purposefully is a different matter. A short, visual explanation on what I mean by applying abstraction purposefully.
The arrow pointing away from concrete indicates the idea of abstracting away. The further we are, the more abstract the concept becomes. Arrows shooting out to different directions signifies that abstraction can happen along many different aspects.
The arrow pointing towards the centre indicates reducing abstraction. The closer we are to the centre, the more concrete it gets.
Exploring other aspects while remaining on the same level of abstraction.
Starting from something concrete and walking thorough levels of abstraction before descending back does not necessarily arrive to the same concrete point.
The irony of explaining abstraction in an abstract manner is not lost on me. Future posts will include more concrete and practical points.
You are a techie in a corporate. The platform you are working on offers a vast range of tools and capabilities. You find the next task/user story/requirement to work on. You immediately think of a way to implement a solution to complete the task and tick the box.
You have choices…
Are you going to discuss alternative implementations with your team
Are you going to check if this implementation has been done somewhere else?
Are you going to review against the existing design standards and patterns?
Are you going to get it done as soon as you can to show high productivity on the dashboard?
If you are an individual contributor, your choices will be limited.
If you are managing a team, then and ask yourself – What can I do to foster the former behaviour over the latter?
Either way, if you feel conflicted about this and prefer to have you and your team showing high productivity on some dashboard, then send a link to this article to your superiors.
In most cases IaC refers to some kind of template-based solution.
Template in this article refers to AWS CloudFormation template
It is strongly debatable if templates are code. If anything, code inside templates is generally an anti-pattern. Think of templates as an “intermediary language” rather than code. Templates are executed by respective platform “engines” driving actual cloud infrastructure configuration management. Consider it “code” as far as an asset managed in version control together with the rest of the source code.
Templates will be sufficient for most infrastructure (IaC) jobs. Most templating languages will have constructs to express logic, for example: conditions. Most will also allow to include inline code as part of the configuration, nonetheless it is code mixed with template, for example: AWS velocity template for DynamoDB.
Writing Infrastructure as Code templates this way – in YAML or JSON – has been reasonable, but not without concerns. A good article on the topic is In defense of YAML.
Templates for templates
The AWS SAM is a good example of simplifying CloudFormation templates, notice the Transform: 'AWS::Serverless-2016-10-31' elements in SAM. A single SAM element may transform into a list of CloudFormation elements deploying a series of resources as a result. Another example is serverless.com their templating language supports multiple platforms while also simplifies the templating compared to CloudFormation.
One of my concerns however is the mixing of Infrastructure as Code with Function as a Service definitions. For example, the definition of a function AWS::Serverless::Function in the same place with an Gateway API AWS::Serverless::Api and related Usage Plan AWS::ApiGateway::UsagePlan. I would like to keep application and infrastructure concerns separate.
My immediate approach was going to be to split up the template into multiple files and use AWS::Include to bring them back together. AWS::Include does not work for all parts of the schema. Trying to use AWS:Include under Resources to include a set of functions simply does not work.
The AWS API gives access to the platform and resources through a range of SDK-s (Python SDK is called boto3). It is truly a low level access to resources that is typically used for developing software on the platform. Infrastructure could be managed using the SDK. There are many good automation examples on how AWS Lambda can react to events from the infrastructure and respond with changes to it. Managing more than a few resources using the SDK is not feasible, considering the coordination it would require: dependencies, delay in resource setup.
There is a better approach: code
The troposphere library allows for easier creation of the AWS CloudFormation JSON by writing Python code to describe the AWS resources.
The examples would led you to believe that using troposphere is not much different than writing a CloudFront or SAM template in Python. Depending on your use case however, there are opportunities to explore:
Function as a Service (FaaS) implementations are heavily influenced by deployment. It would make sense if the deployment details were close – ideally right next – to the code.
In a similar fashion database – for example DynamoDB – resource definition could be close to the ORM or data access implementation.
Make infrastructure and deployment decisions in code. There is of course conditions in CloudFormation, and CloudFormation macros for more complex processing.
Having the infrastructure and deployment configuration close to the implementation code has its own pros and cons.
Development time requirements
Ideally troposphere would only be used in development time (not runtime if we can avoid it), therefore the deployment package would not include this library.
# Development requirements, not for Lambda runtime
# pip install -r requirements-dev.txt
I use template.py in the root of the project, the same place where the template.yaml (or .json) would be, for producing the CloudFormation template.
I have followed two patterns for defining platform resources in the source code.
Class type resource definition
The resource definitions are part of the class implementation.
Note that the troposphere library is only imported at the time of getting the resources. Then it is only used when generating the CloudFormation template. None of these would need to run in the lambda runtime.
# the application component
# class attributes and methods
from troposphere import serverless
# return an array of resources associated with the application component
# the template generator
from troposphere import Template
t = Template()
for r in ApplicationComponent.cf_resources():
Decorator type resource definition
I use decorators for Lambda function implementations. The decorator registers a function (on the function) returning the associated resources (array).
# the wrapper
def wrapped_f(*args, **kwargs):
return f(*args, **kwargs)
from troposphere import serverless
# use relevant arguments from func_args
f = serverless.Function(...) # include all necessary parameters
# add any other resources
wrapped_f.cf_resources = cf_resources
# the lambda function definition
@cf_function(...) # add any arguments
def lambda1(event, context):
# the template generator
from troposphere import Template
t = Template()
for r in lambda1.cf_resources():
AWS recently released the alpha version of their Cloud Development Kit (CDK). I have not tried it yet, but the Python CDK looks super similar to troposphere. Of course, they both represent the same CloudFormation resource definition in Python. troposphere or AWS CDK, they both bring a set of new use cases for managing cloud infrastructure and let us truly define Infrastructure as Code. I will explore the use of troposphere on my next project.
The AWS SAM and CloudFormation mix works well for my projects. I have been working mainly with Python for building Lambda functions on AWS. However, managing code from project to production has been less than trivial. AWS Lambda Deployment Package in Python
This article and the sample project on GitHub shows how to
structure a Python project for Serverless functions,
3rd party libraries, dependencies, are expected to be in the project root.
All project assets in the root will be deployed.
I want to maintain a virtualenv for development and define different dependencies (requirements.txt) for the runtime.
I use the Pycharm IDE for development. My projects have their own virtual environments, and I use the following project structure.
source has all the deployable source code
ext 3rd party libraries, not a Python package
requirements.txt – use it to install dependencies
lib keeps my own shared libraries
init.py it is a Python package
vendor.py library for vendoring, more on this later
functions all function(s) code
__init.py it is a Python package
I may add other packages like models for ORM
runtime_context.py more on this later
requirements.txt this one is kept empty for SAM build
dist is created by SAM build for the deployment
requirements.txt development-time dependencies for the project
template.yaml AWS serverless application template
Vendoring is a technique to incorporate 3rd party packages, dependencies into the project. It is a neat trick used in other languages, and this specific one is adopted from Google’s App Engine library.
Create a directory in the root of the project for the 3rd party packages ext. Add ext to your Python path so dependencies resolve during development.
Create and maintain the requirements.txt inside ext for the deployed runtime.
Install the packages in the ext directory. pip install -t . -r requirements.txt
How to make the code in the ext directory available to the runtime? This is where Google’s helper – google.appengine.ext.vendor – comes in. It adjusts the path for the Python runtime.
Add the code to a project file, for example: /lib/vendor.py
My approach is then to create the runtime_context.py in the root of the project import os from lib import vendor vendor.add(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'ext'))
Any Python application that needs access to the packages in the extdirectory needs a single line of import. import runtime_context
Using the runtime_context.py
I use the runtime_context.py to setup vendoring for all functions. I also place shared configurations and variables here, for example:
Presigned URL-s allow anyone – unauthenticted users – to access S3 objects for a specified duration. In this case the application allows upload (PUT) and download (GET) of S3 objects.
Presigned URL-s also allow the client to use S3’s API directly. There is no need to go through Lambda for uploading or downloading files, which could incure high costs. Since Lambda’s cost is time based, large amount of data transfer over slow connection would eat up a lot of compute time.
Approach to infrastructure as code and SAM’s intricacies
API Design and Swagger specifics
Cross-cutting concerns: securing API with a key
AWS Lambda in Python
Python has first class support on the platform. The AWS SDK is known as Boto 3and it is bundled with the Lambda runtime environment, no need to include as a dependency. However, Lambda does not include the very latest version of the Boto3 library (at the time of this writing).
UPDATE: SAM’s support for Python has a few gotchas when including 3rd party libraries. More about this in the article dedicated to AWS Lambda in Python.
The diagram shows all services included in the app.
I was hoping to make use of an Object Mapper library that abstracts away the DynamoDB low-level API. There are a few good candidates out there such as PynamoDB and Flywheel.
After a short evaluation, I ended up coding up my own lightweight, albeit less generic abstraction, see /source/lib/ddb.py. On a larger project with multiple entities I would definitely use one of the OM libraries.
I originally wanted to build the application to support multi-tenancy, but decided to leave that for another project where it would make more sense. However supporting some of the multi-tenant strategies (SaaS Solutions on AWS: Tenant Isolation Architectures) with these libraries is not trivial, or simply not possible – something to remember.
AWS S3 (Simple Storage Service)
The S3 API is significantly simpler than the DynamoDB API. It was simple enough to use directly from the functions without any abstraction.
When I started, I did not realise S3 uploads recognise the mime-type of the files. I was going to use python-magic for this magic.from_buffer. It turned out is not necessary as S3 objects have a ContentType attribute for this.
The file uploads do not preserve the original file name when placed into S3, they have the same name as the DynamoDB key. When the Presigned URL is generated for the download, using the ResponseContentDisposition attribute can set the file name for the download.
AWS Lambda supports a range of S3 events as triggers. This, and all the other event sources, make the AWS platform and the Serverless model really powerful. The application uses the s3:ObjectCreated event to update the DynamoDB item with properties of the s3 object (file) such as size and mime-type.
Most likely the browser client is going to be on a different domain than the API and S3, therefore CORS settings are necessary to make this application work.
There are two steps for the API to work with CORS:
Create an endpoint for the OPTIONS and return the necessary CORS headers. The API Gateway makes this very easy using a mock integration. This is configured in the Swagger API definition.
Return the Access-Control-Allow-Origin header in the Lambda response.
CORS configuration for S3 resources has a specific place in the CloudFormation template: CorsConfiguration.
If the application is configured at the time of deployment to store uploads immediately – StoreOnLoad=True – then the FileExpireFunction function is not needed.
CloudFormation Condition facility allows control over what gets deployed, amongst other conditional behaviours. In this project, depending on the parameter value, the expire function may or may not get deployed.
I picked FilePond for browser client. It offers a high degree of customisation, and comes with a good set of capabilities.
The server API interaction had to be customised to work with the Filestore and S3 API-s. I implemented the customisation in a wrapper library – static/web/uploader.js. It takes care of the uploading (process) and deleting (revert) of files.
The sample webpage and form static/web/index.html is built using jQuery and Bootstrap to keep it simple. The form has a single file upload managed by the FilePond widget. In this example there is no backend to pick up the data from the form.
See the README.mdWeb app client section for more details on how to deploy the sample web app client on S3 as a static site.
I have experimented two types of tests for this project: unit and integration.
Unit tests are fairly straightforward in Python. The interesting bit here is the stubbing of AWS services for the tests. botocore has a Stubber class that can stub practically any service in AWS.
There is one unit test implemented for the preprocess function, which shows how to stub the DynamoDB and S3 services.
See tests/unit/file.py for more detail on the specific test code.
I have found Tavern ideal for most of my integration testing scenarios.
REST API testing has first class support. Defining tests with multiple steps (stages) in YAML is easy.
There are 4 integration tests defined: upload, store, delete, download. These tick the boxes on the preprocess, store, delete, info, uploaded functions. However it does not help with testing functions like expire which is triggered by a scheduled event only.
Tavern can pick up and use environment variables in the scripts. See the README.mdTest section for more details on how to setup and run the integration tests.
What about the non-REST API-s?
I am reluctant to add a REST endpoint to functions such as expire just for the sake of testability.
The aws CLI can invoke Lambda functions directly and so can Boto 3 via an API call – Lambda.Client.invoke. If there was a way to include non-REST Lambda function invocations in Tavern test cases, that would be ideal. Tavern supports plugins for adding new protocols – it has REST and MQTT added already. I wonder if it is feasible to build a plugin to support Lambda invocations?
Final thoughts on the architecture
The serverless architecture worked very well for this app. In the end the amount of code was relatively small considering the functionality the app provides.