Droptica: How to Effectively Clip Photos? An Overview of the Crop API Drupal Module

Droptica: How to Effectively Clip Photos? An Overview of the Crop API Drupal Module
Droptica: How to Effectively Clip Photos? An Overview of the Crop API Drupal Module

When creating a website whose editors will be working on content with photos, a problem often arises: how to manage the images so that the editor doesn’t have to manually edit them when they want to publish the same content with a different look? In Drupal, we can hit this problem when creating new view modes for any entity with images. The purpose of view modes is to serve the same content in a different form than the default one. For text or date fields, we will use different formatters. And for images?

Crop API and UI modules using this API

Drupal Core allows us to crop images without additional modules, but this functionality isn’t flexible enough to suit all needs. The Crop API Drupal module provides a basic API for more customized cropping of images. Its interface is used, for example, by Image Widget Crop. It provides a UI for cropping images, using predefined cropping types. It’s very useful for websites that publish articles with images or for media management websites.

Another module using Crop API is Focal Point. It allows you to determine which part of the image is the most important. This fragment is used when cropping or scaling an image so that – for example – one of the people present doesn’t lose their head.

The Crop API Drupal module – general information

The module was released on 17 November 2014, its latest version 2.2 was published on 18 February 2022. The module is continuously actively supported and developed. It’s compatible with PHP 8.1 and with Drupal ^8.8 and 9.

Popularity of Crop API

The module has more than 90 thousand installations. There are more and more of them every week, with an increasing trend since the first release. The newer version 2.x took three years to displace the older version 1.x. Currently, the vast majority of websites using Crop API use the newer version.

The statistics presenting numbers of installations of the Crop API Drupal module

Source: Drupal.org

Authors of Crop API

The original creator of the module is Janez Urevc (slashrsm). He works as a Senior Performance Engineer at Tag1 Consulting, where he develops and maintains web applications. He is an active member of the Drupal community – in 2014 he helped launch the media initiative for Drupal 8, where he worked with other community members to bring media in Drupal to an upgraded level.

The Crop API module is officially supported by MD Systems GmbH, and the main maintainers of the module, in addition to Janez, are Adam G-H (phenaproxima), Alexandre Mallet (woprrr), and the Drupal Media Team.

Installation

Crop API doesn’t require the installation of additional libraries. It only has dependencies on the Image and User modules, which are part of Drupal’s core. The installation is therefore carried out in the standard way. As always, we recommend installing the module using Composer.

composer require 'drupal/crop:^2.2'

The module provides two new permissions: Administer crop settings, allowing you to manage basic Crop API settings, and Administer crop types, which allows you to add, delete and edit defined crop types.

A place for managing permissions for the Crop API Drupal module where you can add or edit crop types

 

Use of the Crop API module

As we pointed out in the introduction, the Crop API module alone doesn’t allow for much. It should be seen as an interface that other modules can use. Nevertheless, it has several configuration options that we’ll try to explain.

Crop API provides a new entity type – Crop type. In this entity, we define the crop types we want to use.

Defining the crop types in the Crop type entity in the Crop API Drupal module

 

When adding a new crop type, we have the option to create several settings.

When adding a new crop type in the Crop API module, we can make new settings

Soft limit stretches the image to achieve the given proportions. Hard limit resizes and cuts off part of the image.

Hooks

Crop API provides one additional hook: hook_crop_entity_provider_info_alter. With it, we can change the information about the entity provider, which is calculated by default in the class DrupalcropAnnotationCropEntityProvider. In the hook, we have access to the $providers array. We can change it in order to, for example, edit the media provider title.

Extension modules

Crop API was created to serve as an interface that other modules can use. To obtain the full range of possibilities, it’s necessary to select one of the extension modules that best suits your needs in terms of functionality.

Image Widget Crop

The module provides a widget that allows the user to select one of the predefined crop types. It has a responsive mode for changing the type and for manual adjustment.

Selecting one of the predefined crop types with the Image Widget Crop

 

Option to manually crop an image in the Drupal Image Widget Crop module

 

Focal Point

The module allows us to specify the key point of an image, which will be treated as its center during the cropping process. If you’ve ever used an image whose important part for you has been cropped by scaling with a hard crop – this module will prove to be a salvation.

The Crop API Drupal module – summary

CropAPI is a useful tool that, in combination with supporting modules, provides customized functionality. The installation of this module is recommended if your website requires more flexible solutions than those available in the core.

Are you considering the choice of modules for your project? We’d be happy to suggest which tools would be most suitable for it. On a daily basis, we develop websites on Drupal and use a number of modules for this or create our own ones.

sftp-server(8) gains support for home-directory request

Damien Miller (djm@) has committed
home-directory request
to
sftp-server(8):

CVSROOT:	/cvs
Module name:	src
Changes by:	djm@cvs.openbsd.org	2022/08/11 23:20:28

Modified files:
	usr.bin/ssh    : sftp-server.c PROTOCOL 

Log message:
sftp-server: support home-directory request

Add support to the sftp-server for the home-directory extension defined
in draft-ietf-secsh-filexfer-extensions-00. This overlaps a bit with the
existing expand-path@openssh.com, but uses a more official protocol name,
and so is a bit more likely to be implemented by non-OpenSSH clients.

From Mike Frysinger, ok dtucker@

Boeing joins the ELISA Project as a Premier Member to Strengthen its Commitment to Safety-Critical Applications

Boeing joins the ELISA Project as a Premier Member to Strengthen its Commitment to Safety-Critical Applications

Boeing joins the ELISA Project as a Premier Member to Strengthen its Commitment to Safety-Critical ApplicationsThe ELISA (Enabling Linux in Safety Applications) Project announced that Boeing has joined as a Premier member, marking its commitment to Linux and its effective use in safety critical applications.

The post Boeing joins the ELISA Project as a Premier Member to Strengthen its Commitment to Safety-Critical Applications appeared first on Linux.com.

US political campaign emails can bypass Google’s spam filters under a newly approved pilot project

Federal election regulators voted Thursday to allow Google to proceed with a plan to make it easier for campaign emails to bypass spam filters. Google’s proposal to run a pilot project changing the filters for political emails came after intense Republican criticism that spam filters were biased against conservatives, a charge the tech giant denies. In a sign of public disgust with spam, the Federal Election Commission received thousands of public comments urging it to deny the request. But a majority of the six-member commission decided that Google’s project did not constitute an improper in-kind political contribution that would violate federal campaign finance laws. This reminds me of Twitter admitting it won’t ban nazis because that would mean banning accounts of Republican politicians. I remember the days being biased against nazis was a good thing. Times sure do change.

Pierce Lamb: Custom Workflow Orchestration in Python

Pierce Lamb: Custom Workflow Orchestration in Python
Pierce Lamb: Custom Workflow Orchestration in Python

(I put Custom Workflow Orchestration In Python in an Art Generator and the above popped out)

This post originally appeared on my employer VISO Trust’s blog. It is lightly edited and reproduced here.

On the Data & Machine Learning team at VISO Trust, one of our core goals is to provide Document Intelligence to the audit team. Every Document that passes through the system is subject to collection, parsing, reformatting, analysis, reporting and more. Every day, we work to expand this feature set, increase its accuracy and deliver faster results.

Why we needed workflow orchestration

There are many individual tasks executed which eventually result in what’s provided by Document Intelligence, including but not limited to:

  • Security Control Language Detections
  • Audit Framework Control ID Detections
  • Named Entity Extraction like organizations, dates and more
  • Decryption of encrypted pdfs
  • Translation of foreign language pdfs
  • Document Classification
  • Document Section Detection

Until our workflow orchestration implementation, the features listed above and more were all represented in code inside a single function. Over time, this function became unwieldy and difficult to read; snippets of ceremony, controls, logging, function calls and more sprinkled throughout. Moreover, this is one of the most important areas of our app where new features will be implemented regularly. So the need to clean this code up and make it easier to reason about became clear. Furthermore, execution inside this function occurred sequentially despite the fact that some of its function calls could occur in parallel. While in its current state, parallel execution isn’t required, we knew that in the near future, features in the roadmap would necessitate it. With these two requirements:

  • task execution that is easier to reason about and
  • the ability to execute in parallel

We knew we needed to either use an existing workflow orchestration tool or write it custom. We began with some rough analysis of what was going on in our main automation function, namely, we formalized each ‘step’ into a concept called Task and theorized on which Task’s could execute in parallel. At the time of the analysis, we had 11 ‘Tasks’ each of which required certain inputs and produced certain outputs; based on these inputs and outputs, we determined that a number could run in parallel. With this context, we reviewed some of the major open source python toolkits for workflow orchestration:

Both of these toolkits are designed for managing workflows that have tens, hundreds up to thousands of tasks to complete and can take days or weeks to finish. They have complex schedulers, user interfaces, failure modes, options for a variety of input and output modes and more. Our pipeline will reach this level of complexity someday, but with an 11 Task pipeline, we decided that these toolkits added too much complexity for our use. We resolved to build a custom workflow orchestration toolkit guided by the deep knowledge in these more advanced tools.

Our custom workflow orchestration

The first goal was to generalize all of the steps in our automation service into the concept of a Task. A few examples of a Task would be:

  • detecting a document’s language,
  • translating a foreign language document,
  • processing OCR results into raw text,
  • detecting keywords inside text,
  • running machine learning inference on text.

Just reading this list gives one a feel for how each Task is dependent on a previous Task’s output to run. Being explicit about dependencies is core to workflow orchestration, so the first step in our Task concept was defining what inputs a given Task requires and what outputs it will produce. To demonstrate Task’s, we will develop a fake example Task called DocClassifyInference, the goal of which is to run ML inference to classify a given document. Imagine that our model uses both images of the raw pdf file and the text inside it to make predictions. Our Task, then, will require the decrypted PDF and the paginated text of the pdf in order to execute. Further, when it’s complete it will write a file to S3 containing its results. Thus, the start of our example Task might look like:

https://medium.com/media/094d252043626f462dee2692b54f2b29/href

DocClassifyInference subclasses S3Task, an abstract class that enforces defining a method to write to s3. S3Task itself is a subclass of the Task class which enforces that subclasses define input keys, output keys and an execute method. The keys are enforced in a Pipeline class:

https://medium.com/media/767e8d6e412c0ec6b472df79028c536d/href

This Pipeline will become the object that manages state as our Tasks execute. In our case we were not approaching memory limits so we decided to keep much of the Task state in-memory though this could easily be changed to always write to and read from storage. As a state manager, the Pipeline can also capture ceremony prior to executing any Tasks that downstream Tasks may require.

Continuing on with DocClassifyInference, as a subclass of the abstract class Task, DocClassifyInference will have to implement def execute as well (enforced by Task). This method will take a Pipeline and return a Pipeline. In essence, it receives the state manager, modifies the state and returns it so the next Task can operate on it. In our example case, execute will extract the decrypted pdf and paginated text so they can be used as inputs for a ML model to perform document classification. Let’s look at the entire stubbed out DocClassifyInference:

https://medium.com/media/e70499b33e3aa2979d5713f967406298/href

It’s easy to see how DocClassifyInference gets the Pipeline state, extracts what it needs, operates on that data, sets what it has declared it’s going to set and returns the Pipeline. This allows for an API like this:

https://medium.com/media/c6c6331180e2c768bebe8b6d3d93e156/href

Which of course was much cleaner than what we had previously. It also lends itself to writing easy, understandable unit tests per Task as well as adhering more closely to functional programming principles. So this solves our first goal of making the code cleaner and more easy to reason about. What about parallel processing?

Parallel Processing

Similar to Luigi and Apache Airflow, the goal of our workflow orchestration is to generate a topologically sorted Directed Acyclic Graph of Tasks. In short, having each Task explicitly define its required inputs and intended output allows the Tasks to be sorted for optimal execution. We no longer need to write the Tasks down in sequential order like the API described above, rather we can pass a Task Planner a list of Tasks and it can decide how to optimally execute them. What we’ll want then is a Task Planner that is passed a List of Tasks, sorts the Tasks topologically and returns a list where each member is a list that contains Tasks. Let’s take a look at what this might look like using some of our examples from above:

https://medium.com/media/0a663e1d21fad19f6742fcb8626491c2/href

Here I have retained our examples while adding two new Tasks: KeywordDetection and CreateCSVOutput. You can imagine these like matching keywords in the paginated text and modifying the results of RunDocInference & KeywordDetection to create a formatted CSVOutput. When the Task Planner receives this list, we’ll want it to topologically sort the tasks and output a data structure that looks like this:

https://medium.com/media/1f13a65d01989925558a170c8f56b694/href

In the above List, you can imagine each of its members is a ‘stage’ of execution. Each stage has one-to-many Tasks; in the case of one, execution occurs sequentially and in the case of many, execution occurs in parallel. In english, the expected_task_plan can described like so:

  • DecryptPDF depends on nothing and creates a consumable PDF,
  • PaginatedText depends on a consumable PDF and creates a list of strings
    – RunDocInference depends on both and classifies the document
    – KeywordDetection depends on paginated text and produces matches
  • CreateCSVOutput depends on doc classification and keyword detection and produces a formatted CSV of their outputs.

An example of the function that creates the expected_task_plan above might look like:

https://medium.com/media/3c129c3f6c9c4e794bec10497bf706a7/href

This function gets the list of Tasks, ensures that no two Task outputs have identical keys, adds the nodes to a sorter by interrogating the Task input_keys and output_keys and sorts them topologically. In our case the sorter comes from graphlib’s TopologicalSorter which is described here. Getting into what each of these functions are doing would take us too far afield so we will move on to executing a task plan.

With the expected_task_plan shown above, an execute_task_plan() function is straightforward:

https://medium.com/media/b734c2dfa055816d3da58282a3a85802/href

Here we iterate over the task list deciding between sequential execution or parallel execution. In the latter case, we utilize python’s threading.Thread library to create a thread per task and use idiomatic methods for starting and joining threads. Wait, then what is TaskThread?

In our case, we wanted to ensure that an exception in a child thread will always be raised to the calling thread so the calling thread can exit immediately. So we extended the threading.Thread class with our own class called TaskThread. Overriding threading.Thread’s .run() method is fairly common (so common that it’s suggested in run()’s comments); we overrode run() to set an instance variable carrying an exception’s content and then we check that variable at .join() time.

https://medium.com/media/32dad77b5be2c90b1dc3e9ef857987cc/href

The calling thread can now try/except at .join() time.

Conclusion

With these structures in place, the file containing the automation service’s primary functions was reduced from ~500 lines to ~90. Now when we create our threadpool to consume SQS messages, we get the Task plan like so task_plan = get_task_plan() and pass the task_plan into each thread. Once execution reaches the main function for performing document intelligence, what previously was a large section of difficult-to-read code now becomes:

https://medium.com/media/86e2f5c586197e6dd85c8f3c16d22409/href

The introduction of parallel processing of these Task’s shaved consistent time off of performing document intelligence (an average of about a minute). The real benefit of this change, however, will come in the future as we add more and more Tasks to the pipeline that can be processed in parallel.

While we’ve reduced the time-to-audit significantly from the former state-of-the-art, we are definitely not done. Features like the above will enable us to continue reducing this time while maintaining consistent processing times. We hope this blog helps you in your workflow orchestration research.