When creating a website whose editors will be working on content with photos, a problem often arises: how to manage the images so that the editor doesn’t have to manually edit them when they want to publish the same content with a different look? In Drupal, we can hit this problem when creating new view modes for any entity with images. The purpose of view modes is to serve the same content in a different form than the default one. For text or date fields, we will use different formatters. And for images?
Crop API and UI modules using this API
Drupal Core allows us to crop images without additional modules, but this functionality isn’t flexible enough to suit all needs. The Crop API Drupal module provides a basic API for more customized cropping of images. Its interface is used, for example, by Image Widget Crop. It provides a UI for cropping images, using predefined cropping types. It’s very useful for websites that publish articles with images or for media management websites.
Another module using Crop API is Focal Point. It allows you to determine which part of the image is the most important. This fragment is used when cropping or scaling an image so that – for example – one of the people present doesn’t lose their head.
The Crop API Drupal module – general information
The module was released on 17 November 2014, its latest version 2.2 was published on 18 February 2022. The module is continuously actively supported and developed. It’s compatible with PHP 8.1 and with Drupal ^8.8 and 9.
Popularity of Crop API
The module has more than 90 thousand installations. There are more and more of them every week, with an increasing trend since the first release. The newer version 2.x took three years to displace the older version 1.x. Currently, the vast majority of websites using Crop API use the newer version.
Source: Drupal.org
Authors of Crop API
The original creator of the module is Janez Urevc (slashrsm). He works as a Senior Performance Engineer at Tag1 Consulting, where he develops and maintains web applications. He is an active member of the Drupal community – in 2014 he helped launch the media initiative for Drupal 8, where he worked with other community members to bring media in Drupal to an upgraded level.
The Crop API module is officially supported by MD Systems GmbH, and the main maintainers of the module, in addition to Janez, are Adam G-H (phenaproxima), Alexandre Mallet (woprrr), and the Drupal Media Team.
Installation
Crop API doesn’t require the installation of additional libraries. It only has dependencies on the Image and User modules, which are part of Drupal’s core. The installation is therefore carried out in the standard way. As always, we recommend installing the module using Composer.
composer require 'drupal/crop:^2.2'
The module provides two new permissions: Administer crop settings, allowing you to manage basic Crop API settings, and Administer crop types, which allows you to add, delete and edit defined crop types.
Use of the Crop API module
As we pointed out in the introduction, the Crop API module alone doesn’t allow for much. It should be seen as an interface that other modules can use. Nevertheless, it has several configuration options that we’ll try to explain.
Crop API provides a new entity type – Crop type. In this entity, we define the crop types we want to use.
When adding a new crop type, we have the option to create several settings.
Soft limit stretches the image to achieve the given proportions. Hard limit resizes and cuts off part of the image.
Hooks
Crop API provides one additional hook: hook_crop_entity_provider_info_alter. With it, we can change the information about the entity provider, which is calculated by default in the class DrupalcropAnnotationCropEntityProvider. In the hook, we have access to the $providers array. We can change it in order to, for example, edit the media provider title.
Extension modules
Crop API was created to serve as an interface that other modules can use. To obtain the full range of possibilities, it’s necessary to select one of the extension modules that best suits your needs in terms of functionality.
Image Widget Crop
The module provides a widget that allows the user to select one of the predefined crop types. It has a responsive mode for changing the type and for manual adjustment.
Focal Point
The module allows us to specify the key point of an image, which will be treated as its center during the cropping process. If you’ve ever used an image whose important part for you has been cropped by scaling with a hard crop – this module will prove to be a salvation.
The Crop API Drupal module – summary
CropAPI is a useful tool that, in combination with supporting modules, provides customized functionality. The installation of this module is recommended if your website requires more flexible solutions than those available in the core.
Are you considering the choice of modules for your project? We’d be happy to suggest which tools would be most suitable for it. On a daily basis, we develop websites on Drupal and use a number of modules for this or create our own ones.
Damien Miller (djm@) has committed
home-directory request
to sftp-server(8):
CVSROOT: /cvs
Module name: src
Changes by: djm@cvs.openbsd.org 2022/08/11 23:20:28
Modified files:
usr.bin/ssh : sftp-server.c PROTOCOL
Log message:
sftp-server: support home-directory request
Add support to the sftp-server for the home-directory extension defined
in draft-ietf-secsh-filexfer-extensions-00. This overlaps a bit with the
existing expand-path@openssh.com, but uses a more official protocol name,
and so is a bit more likely to be implemented by non-OpenSSH clients.
From Mike Frysinger, ok dtucker@
The ELISA (Enabling Linux in Safety Applications) Project announced that Boeing has joined as a Premier member, marking its commitment to Linux and its effective use in safety critical applications.
This alpha release reverts extensive whitespace changes to –help output, so
as not to annoy translators (thanks, Benno Schulenberg!). These will be
restored before the next stable release in a “whitespace-only” change.
App stores require that Open Source developers constantly jump through ever-changing hoops. This is an unsustainable demand. Read a proposal on how to change that.
Federal election regulators voted Thursday to allow Google to proceed with a plan to make it easier for campaign emails to bypass spam filters. Google’s proposal to run a pilot project changing the filters for political emails came after intense Republican criticism that spam filters were biased against conservatives, a charge the tech giant denies. In a sign of public disgust with spam, the Federal Election Commission received thousands of public comments urging it to deny the request. But a majority of the six-member commission decided that Google’s project did not constitute an improper in-kind political contribution that would violate federal campaign finance laws. This reminds me of Twitter admitting it won’t ban nazis because that would mean banning accounts of Republican politicians. I remember the days being biased against nazis was a good thing. Times sure do change.
The Gurene Wikimedia Community held a pre-Wikimania event at the Ajumako campus of the University of Education, Winneba. The event, comprising 22 participants from the…
(I put Custom Workflow Orchestration In Python in an Art Generator and the above popped out)
This post originally appeared on my employer VISO Trust’s blog. It is lightly edited and reproduced here.
On the Data & Machine Learning team at VISO Trust, one of our core goals is to provide Document Intelligence to the audit team. Every Document that passes through the system is subject to collection, parsing, reformatting, analysis, reporting and more. Every day, we work to expand this feature set, increase its accuracy and deliver faster results.
Why we needed workflow orchestration
There are many individual tasks executed which eventually result in what’s provided by Document Intelligence, including but not limited to:
Security Control Language Detections
Audit Framework Control ID Detections
Named Entity Extraction like organizations, dates and more
Decryption of encrypted pdfs
Translation of foreign language pdfs
Document Classification
Document Section Detection
Until our workflow orchestration implementation, the features listed above and more were all represented in code inside a single function. Over time, this function became unwieldy and difficult to read; snippets of ceremony, controls, logging, function calls and more sprinkled throughout. Moreover, this is one of the most important areas of our app where new features will be implemented regularly. So the need to clean this code up and make it easier to reason about became clear. Furthermore, execution inside this function occurred sequentially despite the fact that some of its function calls could occur in parallel. While in its current state, parallel execution isn’t required, we knew that in the near future, features in the roadmap would necessitate it. With these two requirements:
task execution that is easier to reason about and
the ability to execute in parallel
We knew we needed to either use an existing workflow orchestration tool or write it custom. We began with some rough analysis of what was going on in our main automation function, namely, we formalized each ‘step’ into a concept called Task and theorized on which Task’s could execute in parallel. At the time of the analysis, we had 11 ‘Tasks’ each of which required certain inputs and produced certain outputs; based on these inputs and outputs, we determined that a number could run in parallel. With this context, we reviewed some of the major open source python toolkits for workflow orchestration:
Both of these toolkits are designed for managing workflows that have tens, hundreds up to thousands of tasks to complete and can take days or weeks to finish. They have complex schedulers, user interfaces, failure modes, options for a variety of input and output modes and more. Our pipeline will reach this level of complexity someday, but with an 11 Task pipeline, we decided that these toolkits added too much complexity for our use. We resolved to build a custom workflow orchestration toolkit guided by the deep knowledge in these more advanced tools.
Our custom workflow orchestration
The first goal was to generalize all of the steps in our automation service into the concept of a Task. A few examples of a Task would be:
detecting a document’s language,
translating a foreign language document,
processing OCR results into raw text,
detecting keywords inside text,
running machine learning inference on text.
Just reading this list gives one a feel for how each Task is dependent on a previous Task’s output to run. Being explicit about dependencies is core to workflow orchestration, so the first step in our Task concept was defining what inputs a given Task requires and what outputs it will produce. To demonstrate Task’s, we will develop a fake example Task called DocClassifyInference, the goal of which is to run ML inference to classify a given document. Imagine that our model uses both images of the raw pdf file and the text inside it to make predictions. Our Task, then, will require the decrypted PDF and the paginated text of the pdf in order to execute. Further, when it’s complete it will write a file to S3 containing its results. Thus, the start of our example Task might look like:
DocClassifyInference subclasses S3Task, an abstract class that enforces defining a method to write to s3. S3Task itself is a subclass of the Task class which enforces that subclasses define input keys, output keys and an execute method. The keys are enforced in a Pipeline class:
This Pipeline will become the object that manages state as our Tasks execute. In our case we were not approaching memory limits so we decided to keep much of the Task state in-memory though this could easily be changed to always write to and read from storage. As a state manager, the Pipeline can also capture ceremony prior to executing any Tasks that downstream Tasks may require.
Continuing on with DocClassifyInference, as a subclass of the abstract class Task, DocClassifyInference will have to implement def execute as well (enforced by Task). This method will take a Pipeline and return a Pipeline. In essence, it receives the state manager, modifies the state and returns it so the next Task can operate on it. In our example case, execute will extract the decrypted pdf and paginated text so they can be used as inputs for a ML model to perform document classification. Let’s look at the entire stubbed out DocClassifyInference:
It’s easy to see how DocClassifyInference gets the Pipeline state, extracts what it needs, operates on that data, sets what it has declared it’s going to set and returns the Pipeline. This allows for an API like this:
Which of course was much cleaner than what we had previously. It also lends itself to writing easy, understandable unit tests per Task as well as adhering more closely to functional programming principles. So this solves our first goal of making the code cleaner and more easy to reason about. What about parallel processing?
Parallel Processing
Similar to Luigi and Apache Airflow, the goal of our workflow orchestration is to generate a topologically sortedDirected Acyclic Graph of Tasks. In short, having each Task explicitly define its required inputs and intended output allows the Tasks to be sorted for optimal execution. We no longer need to write the Tasks down in sequential order like the API described above, rather we can pass a Task Planner a list of Tasks and it can decide how to optimally execute them. What we’ll want then is a Task Planner that is passed a List of Tasks, sorts the Tasks topologically and returns a list where each member is a list that contains Tasks. Let’s take a look at what this might look like using some of our examples from above:
Here I have retained our examples while adding two new Tasks: KeywordDetection and CreateCSVOutput. You can imagine these like matching keywords in the paginated text and modifying the results of RunDocInference & KeywordDetection to create a formatted CSVOutput. When the Task Planner receives this list, we’ll want it to topologically sort the tasks and output a data structure that looks like this:
In the above List, you can imagine each of its members is a ‘stage’ of execution. Each stage has one-to-many Tasks; in the case of one, execution occurs sequentially and in the case of many, execution occurs in parallel. In english, the expected_task_plan can described like so:
DecryptPDF depends on nothing and creates a consumable PDF,
PaginatedText depends on a consumable PDF and creates a list of strings – RunDocInference depends on both and classifies the document – KeywordDetection depends on paginated text and produces matches
CreateCSVOutput depends on doc classification and keyword detection and produces a formatted CSV of their outputs.
An example of the function that creates the expected_task_plan above might look like:
This function gets the list of Tasks, ensures that no two Task outputs have identical keys, adds the nodes to a sorter by interrogating the Task input_keys and output_keys and sorts them topologically. In our case the sorter comes from graphlib’s TopologicalSorter which is described here. Getting into what each of these functions are doing would take us too far afield so we will move on to executing a task plan.
With the expected_task_plan shown above, an execute_task_plan() function is straightforward:
Here we iterate over the task list deciding between sequential execution or parallel execution. In the latter case, we utilize python’s threading.Thread library to create a thread per task and use idiomatic methods for starting and joining threads. Wait, then what is TaskThread?
In our case, we wanted to ensure that an exception in a child thread will always be raised to the calling thread so the calling thread can exit immediately. So we extended the threading.Thread class with our own class called TaskThread. Overriding threading.Thread’s .run() method is fairly common (so common that it’s suggested in run()’s comments); we overrode run() to set an instance variable carrying an exception’s content and then we check that variable at .join() time.
The calling thread can now try/except at .join() time.
Conclusion
With these structures in place, the file containing the automation service’s primary functions was reduced from ~500 lines to ~90. Now when we create our threadpool to consume SQS messages, we get the Task plan like so task_plan = get_task_plan() and pass the task_plan into each thread. Once execution reaches the main function for performing document intelligence, what previously was a large section of difficult-to-read code now becomes:
The introduction of parallel processing of these Task’s shaved consistent time off of performing document intelligence (an average of about a minute). The real benefit of this change, however, will come in the future as we add more and more Tasks to the pipeline that can be processed in parallel.
While we’ve reduced the time-to-audit significantly from the former state-of-the-art, we are definitely not done. Features like the above will enable us to continue reducing this time while maintaining consistent processing times. We hope this blog helps you in your workflow orchestration research.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.