website scraping with laravel

Building A Simple Scraping Website With PHP Laravel Part2: Dashboard and Crud

In this part of this tutorial (Building a simple scraping website) we will continue and in this part we will create a simple dashboard and some crud operations for our modules.

 

 

Series Topics:

 

At first we will need some modules for the database tables we just created in the previous part, i will not go into the details of explaining those modules because it is just a normal crud operations, but i will explain the required part of scraping.

 

Creating Controllers

Let’s first create the required controllers, i will use resource controllers here so in the terminal run these commands:

 

Next let’s add the routes for those controllers

Open routes/web.php and add the below code:

As you see above all the modules url will be under dashboard group so for example to go to article list type /dashboard/articles

 

At first modify app/Http/Controllers/Controller.php like this:

 

 

 

Adding Categories

Let’s start with the category module open app/Http/Controllers/CategoriesController.php and modify it like this:

Next create a new view resources/views/dashboard/category/index.blade.php and add the below code:

resources/views/dashboard/category/create.blade.php

resources/views/dashboard/category/edit.blade.php

The above code self explanatory just a simple crud for the categories module. First we updated the Categories controller, then we added three views for create, edit, and list categories.

 

Adding Websites

Now open up app/Http/Controllers/WebsitesController.php and modify it like this:

resources/views/dashboard/website/index.blade.php

resources/views/dashboard/website/create.blade.php

resources/views/dashboard/website/edit.blade.php

Make sure to create uploads/ directory inside public/ folder and give it a writable permissions, this directory will be used as the uploaded website logos as shown in the code above

Adding Item Schema

Item schema as we mentioned in the previous article represent the schema for a single item in a list of items so we need to construct an expression that represent that schema

 

Open app/Http/Controllers/ItemSchemaController.php and modify it like this:

resources/views/dashboard/item_schema/index.blade.php

resources/views/dashboard/item_schema/create.blade.php

resources/views/dashboard/item_schema/edit.blade.php

Atypical css expression takes this structure:

For example the item schema expression to pull articles from the new york times website will be as follows:

As shown the expression identifies every field of data that need to be fetched separated by “||”. Every field has two parts the first one is the field name and the other part is the css selector between two brackets “[]”. The field name must match the field name in the database. In case of attributes like image src we add the attribute inside “[]” after the css selector.

Adding Links

The links module is the most important module as it stores the links we will fetch data from and will do the actual scraping process

app/Http/Controllers/LinksController.php

In the above code we need to focus on the scrape() method, this method do the actual job of scraping by calling another class Scraper which we will implement shortly to fetch and scrape a certain link like so:

create a new file app/Lib/Scraper.php and add the below code:

The main method in the above code is the handle() method. This method works on the Goutte client package. It takes a link object, creates a crawler object from the given url.

Then it translate the css expression for the item schema attached with that link into an array of fields and their selectors with translateCSSExpression() method:

After we convert the expression into the array we move into the filtering process passing the main filter selector to the filter() method, this will gives us a collection of results we iterate over them using each() method, inside that function we get the different pieces of data like this:

Using the $node variable passed to the callback we can get and filter the sub elements we need to fetch such as titles and images. As a result we looped over $translateExpr which is the translated css expression and return an array of $data to be saved into the database.

It’s better to put the code between try catch block as the fetching process may result in an error in any time due to many reasons like network loss or not found nodes matching the given expressions.

/resources/views/dashboard/link/index.blade.php

/resources/views/dashboard/link/create.blade.php

/resources/views/dashboard/link/edit.blade.php

 

Create app/Http/Controllers/HomeController.php and add this code, this will be our home controller:

 

Create resources/views/home.blade.php and add this code:

 

Now modify routes/web.php to be like this:

Modify resources/views/layout.blade.php and add the actual links

Now try to navigate to http://localhost/web_scraper/public/ and try to add websites and categories.

 

This video uses an example to demonstrate the process

In the final part of the tutorial we will implement the Frontend and article display.

 

Continue to part 3 >>> Article Display In Home Page

Share this: