Web Content Curation Datasheet

gework · 03-16-2019, 02:35 AM

This is a datasheet about web content curation - making money online from pre-existing data that you have enriched. This will be most of use to people who are looking for a direction for their work or are already online. That you become a programmer is virtually essential in my opinion. If that's not of interest, you may find something of note in the final section.

You can go down two routes with this:

1) becoming a resource to inform people's buying decisions (more affiliate)
2) becoming a reference source (more CPC ads, possible API business)

Some benefits of this model are:

1) You don't have to constantly update your site (like a blog)
2) You don't have any clients or bosses (like an eCommerce site, freelancing)
3) You don't have to reside in one location (and deal with Western women)
4) It is very easy to disappear into the international grey space
5) You can choose whatever sector you want to work in
6) You don't have to produce much if any original content
7) Profit margins are potentially huge

As a result of this I have virtually zero stress, despite often working far more hours than most. The only problems are those I make for myself, to be solved.

Prerequisites

This is a road that if you go down you will need quite a broad set of skills as you will be developing something yourself from nothing and probably have limited experience and resources to hire people. You'll need to know HTML, CSS, JavaScript, a server language, a database system, basic SEO, usability, Linux and some info on the market you want to cover.

I won't go into much detail about learning any of this as there are already threads on that. But in short, start with tutorial sites like W3Schools; then read the manuals for whatever server language you want to learn (like PHP). Also switch to using Linux (like Ubuntu) so you will know how to manage your own server (actually very simple). I'd also almost always avoid any content management systems or frameworks like WordPress or Sympfony. They are horrendously bloated and will stop you from learning as much as you would if you made your sites yourself. For my work, I use a very basic HTTP handler, a DB handler, a template handler and a form handler. I built these about ten years ago; and build whatever I need for a job in a standardised way on top of that. If you start with WordPress you may never learn how to build complex things yourself.

One big issue for me when learning programming was doubt. I was always worried about my ability and was unsure if I'd ever be good at it. This ended when I decided to make something quite complicated and had to learn a lot of things on my own. After this it was easy.

It's also a good idea to get yourself in among some people you can learn from, who are working at the highest level you can get your hands on, i.e. a good local web development company. This is the fastest way to learn. University is a complete waste.

One of the most important areas to lean is SEO (search engine optimisation) – or increasing your rankings in search engines, which may be your primary source of traffic. Although there can be a lot to know, you don't need to know much to start. Learn how to make pages that search engines can read and understand, learn the importance of links, sitemaps and basic technical details. Avoid bad SEO (known as black or grey hat) to trick search engines into giving you more traffic. Focus on the quality of your content over time, rather than cheap manipulations. Everyone I know who went down the latter route eventually gave up as their life revolved around constantly building new things of little value that were quickly washed away.

Another key area to get schooled in is economics and basic market forces. When I was young I made a number of sites, just because I came up with the idea. They weren't particularly good from a monetisation point of view and I had little idea how to get the best I could from them. I was oblivious to market forces and just doing want came to mind, with no game plan and hope something might happen.

On usability, this is another key, but quite similar area to know about. It doesn't matter what your site looks like. What matters is that it keeps as many people on it as possible, it gets a good reputation and has as many people funneled towards your money spinners as possible.

This will be a long path to go down; and like any path you don't know where it will lead. You may waste time, but if you want any of the benefits listed above, it's the only way. Some sort of casual mentorship can be good in any area. In terms of programming and web development I was put on the right track very early. If not I would have been blind and may have fallen in to blind practices or spent much longer finding out the best practices.

The last point is that you will have far many more ideas that ability to execute them. When I started with what I was doing now I had lots of ideas that I either dropped or still haven't got round to many years later; on top of several spin off ideas I never started.

Only those who have got the resources and experience of one success can take on multiple projects. If all you are bringing is your own work and a little bit of capital then you can only handle one project.

Look at the opportunities, assess how people may react, the way the market is blowing, how much search traffic can you get, what is your competition etc. and then keep going on that path until you succeed or decide it's not going to work (at least one year). If you do fail, don't treat it as wasted time, but a time of learning. Come back again with your new knowledge, experience and better idea.

Data

The basis of content curation is finding data that others have produced and presenting it in a way that is more useful, easier to find and richer than the original source. This data may be found in one source, multiple sources or from a disparate number of sources that you collate. In the latter case web scrapping may be used.

Examples

Here are some examples of sites that use pre-existing data to make money. Traffic and income estimates from SimilarWeb and WorthOfWeb.

1) CoinMarketCap – aggregates trading data on over 2,000 cryptocurrencies, which they extract from freely available APIs of crypto exchanges, i.e. the data itself is not deemed as commercially valuable, but their presentation and enrichment of the data led it to becoming the 100th most visited site in 2017; and they must have made something like $10 million in raw profit that year. The site is not something a good programmer could not make and have pretty well oiled in one year.

Estimates: 1.9 million visits per day, netting $75,000

2) CPU Boss – aggregates data on CPUs from a handful of technical and benchmark sources, with affiliate links to buy CPUs.

Estimates: 93,000 visits per day, netting $1,250

3) Behind The Name – collected and user-submitted name meaning, fluffed with ads.

Estimates: 130,000 visits per day, netting $1,500

4) Geographical Names – this site is a US government produced database of global place names attached to a Google Map. That's it. It could be enriched by all sorts of other data like photos pulled in from Panaramio, weather, etc. Many government produce similar databases of places names in their country, usually listed under the name gazetteer. This would be a great starter projects as it's very simple and under-developed niche. I have no doubts that if you made it your job to reproduce all the similar government databases you can find, you would be location independent.

Estimates: 19,000 visits per day, netting $145

5) MyIP.ms – this site will easily net $100,000 per year from collating data from a few sources like WHOIS and DNS servers. They also sell their data via API and wholesale, which could net $1,000,000 or more.

Estimates: 54,000 visits per day, netting $1,400

6) British Listed Buildings – this is another example of a site that has taken one freely available database produced by the UK government and done little more than spew it out on a site, add in Google maps code and a comment form. Many governments produce a database of listed buildings. If you aggregate them all and enrich them with sources from Google Books, image sites and so on, again, you're location independent.

Estimates: 6,250 visits per day, netting $70

7 DomComp – this site makes it easy for you to find the cheapest place to get a domain name from. Very simple. The good thing about a site like this is people have already decided they are going to make a purchase and they are going through you to find the cheapest way to spend money, so affiliate conversion will likely be high.

Estimates: 4,500 visits per day, netting $116

Starting

If you are a novice developer are with no/little experience, as mentioned get yourself a job somewhere you can learn, even if it's an internship. Read up a bit on economics, markets and SEO. Look around the web for ideas and evaluate opportunities. Once you have some skills and an opportunity, I would start with something smaller, but probably also something that could go towards netting $100 per day, at which point you will have more skills and experience, be better placed to tackle a much bigger project and have some capital to do so; as well as have the opportunity to move somewhere cheap for low costs of living and las mas dulces lindas; ili tanke djevojke.

A much better and broader version of the above mentioned Geographical Names site would be a good start.

A few other more basic ideas:

1 – OCR – there is a website (can't remember the name) that just reproduces text from OCRed books from Google Books. Not a major money spinner, but an easy start. There are a number of sources of book images that aren't OCRed online. Some of them are from newspaper archives. For example the Greek government hosts images of many historic Greek newspapers without OCR text. You could grab them. Better, The Times of London is available via an academic provider. Anyone can get it for free from Bedfordshire Libraries, regardless of where you live. You can even extract the newspaper without being logged in. OCR them and dump them online with ads. There is also another academic product called Early English Books and another called something like 19th Century Books (forget the name). Most of them are not available via Google Books etc.

2 – Expired sites – You can get lists of expired domains from a site like expireddomain.net or Register Compass. You could use them to rip out the old copy of the site from Archive.org and host them on something like GeoCities Archive. I have done something similar, but sold the site. A decent starter to intermediate project.

3 – eBay – Other good intermediate projects could be from ripping eBay. One would be to compile a catalog of historic coins. Scrape eBay for all the coins for sale, aggregating the average price by quality. You can use the images to create image galleries and write up the text from pre-existing sources. The good things about these ones is that its a niche for older people with a bit of money and in those sorts of niches there is essentially no real competition as there are few people who are both interested and can produce a good site. You can do the same with stamps. You can also rip full scale postcards that people are selling on there. You can scrape over 100,000 per month. Catalog them, link them to maps and what not. Add your ads. Another simpler one is to archive all the antique and collectible items on there. There is a site that does this, but it's not very good.

APIs

Depending on the data you are cataloging, you may also be able to sell access to your data via API. If you get to this point you'll also need to lock your website down from web scrapping, which is fairly advanced. Both the above mentioned sites CoinMarketCap and MyIP.ms offer APIs and probably make more than $1 million per year from data it costs them virtually nothing to collect.

This is an area I have been going into and moving towards focusing on as it's become apparent that it's probably worth several times what the site makes from ads and affiliates. So when looking for your topic it's a good consideration to keep in mind – can you sell the data you are collecting via an API or wholesale.

Full Spectrum Specialisation

As I see it, the key with this game, and probably most other games, is becoming a well rounded in the multiple fields required to operate in the game. For this it's: interface, programming, SEO, usability, servers, economics, psychology; plus knowledge of your niche. You'll find very few people fit that bill. Most people aren't even very good at one thing. You need all those skills to be able to push yourself out of the gates with nothing but your own time. Once you get capital, then you can delegate jobs to people you handpick based on knowing the field.

This is a big problem I have observed with businesses. The owners build up the business and then they end up getting in staff or 3rd parties to areas they know nothing about, like marketing or online. They can end up getting in someone who is a giant dead-weight – the sort of person who is petrified of some of his underlings who he knows are better than him. In organisations I've seen how these people clog up and cause huge inefficiencies. The owners know no better as they have no idea what it is those people do.

My initial niche is very centric to older, wealthier people. Within about a year my site had the second highest traffic in the niche; and the most with about two years. The people who ran competitor sites knew the topic, but they were dinosaurs when it came to online and not well orientated to providing what would be most useful to people.

Then within fours years the site was in the top ten for traffic in the sector, which is worth about $2 billion per year. Higher than a company with over 100 employees; while I had one. The reason is this large company doesn't have anyone who knows enough about the full-spectrum of the areas it needs to get right. They may have them disparately spread throughout the company, but there is no one who can bring it together and probably a good number of managers who are more concerned about the safety of the job they got because the person who hired them didn't know the area they were hiring for.

If you read the history of Amazon you'll see this is why the company has become so successful. Bezos started out as a programming nerd. He switched to customers servies, then product management, then hedge fund management. He had a fairly full-spectrum knowledge and experience of all the areas you would need to know about to start picking people who could do a good a job as you and knit them all together. Whereas if you look at a lot of other companies like Facebook, DuoLingo and Google. The brainchilds are programming nerds who have relied on a bunch of big money guys to make sure they don't make their companies only about having a litter tray for their trans-species staff.

TitanEssence · 03-16-2019, 03:01 AM

Most solid post I've seen in a while. Thanks for sharing!

Bain · 03-16-2019, 08:14 AM

Good information -

Lampwick · 03-16-2019, 11:02 AM

Great datasheet Gework. Long overdue +1 from me. Have you extended this practice to mobile apps, or do you stick with web?

balybary · 03-16-2019, 02:25 PM

I am working on a project like that, using RSS feeds and APIs of other websites to collect datas.

One interesting point is that you don't need to invest a lot of money when you start.

In my case:

VPS (Virtual Private Server) with Ubuntu: $48 per year

A basic laptop with Ubuntu: $265

Tutorials on how to build your first website: free

gework · 03-17-2019, 04:58 AM

Quote: (03-16-2019 11:02 AM)Lampwick Wrote:

Great datasheet Gework. Long overdue +1 from me. Have you extended this practice to mobile apps, or do you stick with web?

I have no experience with apps. I've thoughts about it, but with what I'm doing there doesn't seem to be huge interest in apps. There is one company in the sector with 5M+ downloads and another two with 1M+. But most of the apps that are similar are at 100K+; and they are not things that would be used much.

The best idea I had was to make some quizzes/games with some of the data in the future. It seems with apps you need to get that regular users and you're looking at about $120 for 10,000 daily users. I don't think I'd make $5 a day with my current content as it's not going to be used much. My main goal is the API, which I want to push towards making $0.05 / second.

One of the other good things about curation is that is fairly easy to know if it will generate good revenue from the outset. If you have the skills and can make better versions of the sites listed, then after time you will be making money. If you are doing SaS, products or blogs you have to put a lot of effort in with little idea of how well they will be received.

Quote: (03-16-2019 02:25 PM)balybary Wrote:

I am working on a project like that, using RSS feeds and APIs of other websites to collect datas.

VPS (Virtual Private Server) with Ubuntu: $48 per year

Good luck. Kimsufi are a very cheap and reliable option for when you need to upgrade to a dedicated server. You can run big sites on those for about $120-200 per year. I pay about $600 at the sister company, SoYouStart, though could no doubt run it from the $120 server as I use a 100% static HTML cache.

I started out with a $25 HostGator, which was able to handle about 6,000 visitors per day; until they were bought out and had to move over to dedicated with the above company.

My starting capital was $25 for hosting and $120 for a domain. I only went down the route as I was in a very difficult situation at the time. Good argument against UBI.

The Grey · 03-17-2019, 06:18 AM

Excellent thread, thanks for taking the time to share. If you can discover something people want and deliver it to them, the potential profits are juicy. When you think of the low costs as well, it's a golden age for people with certain skills. I recently picked up programming and I'm genuinely interested in building something like this.

One area I would like to know more about - do you have a process for generating new ideas like these? I'm not asking for your specific tactics, but can you shed some light on how you gauge demand?

gework · 03-17-2019, 09:31 AM

Quote: (03-17-2019 06:18 AM)The Grey Wrote:

One area I would like to know more about - do you have a process for generating new ideas like these? I'm not asking for your specific tactics, but can you shed some light on how you gauge demand?

I don't come up with many ideas, as it's best to stay away from them. I got distracted trying to make a crypto site last year, but I just don't have the time for it. Mainly I will see pre-existing ideas, but as noted in the post I had a couple of other ideas. Most of the ideas I've had I can't remember off the top of my head.

But the site I run, I first noted the idea when I was 16. There was a site listing links to other sites and I noticed that some of them went through funny links like: http://jdocqury.com/partnertlink?2332423&pref=34234

And assumed this must have been some way of tracking traffic, so the referring site could earn commissions. I noted it was something I could do; and did many years later.

If there are any areas you are interested in, like sports supplements or cars, look at the information sites in the niche; any comparison sites. Look at the main affiliate brokers like CJ.com. See what's being done. You may be able to get more ideas from those or see how they can be done better. If you pick something good you should soon have lots of ideas of features for the site and can choose the best ones.

I based the site on two other sites, with the aim of doing what they did bigger and better. There wasn't much unique about it. To assess it I gauged the approximate amount of traffic to the site, which I think was about 20,000 per day. I considered that the site was a helping aid for people to make buying decisions and assumed that would lead to good conversion rates. The main affiliate programs were paying out 25-35% (info products). I noted that it was also low competition, as it's mainly a niche for old people who can't make sites. That's why I suggested a catalog of stamps above. It's for old people and there are is no serious competition in terms of reference sites.

So I knew there was the potential to get to around 20,000 hits per day (it took 2 years to get to this level). I presumed I could make something like $600 or £600 per day getting 20,000 visits per day. So I presumed I could make $/£100 from 3,000 visits per day.

I made the site and put it up. As it was a brand new domain it spent most of the first six months in the sandbox (which definitely does exist). I think the average traffic was about 75-150 visits per day to start with. It helped that the site was big - about 150,000 pages and they were low competition (SERPs). Even with that amount of traffic I was getting about a sale per day and the average commission was about $15.

The site popped out of the sandbox and went up to 100s visits per day. And it was probably sometime like that I made the first £100 in a day. So by that point I knew it was going to be able to make as much as I wanted, within the first month or so. But for the first six months it popped in and out of the sandbox. I just recycled all the income into getting more content on the site.

So your variables are: level of competition, % affiliate payouts, conversion rate, predicated traffic.

If you can get a lot of pages on a low competition site then that's a very good start. It obviously takes time to ease a site into search rankings and that will make it easier for you. That would be where certain government databases come in handy. Like that British Listed Buildings site. It's a database of 500,000 buildings (low competition) with probably 10s of millions of words of text that are just lying there, waiting to go online. Something like that would be a very easy starter site, but not a long-term project.

Also, in particular with new sites Google doesn't like a lack of unique text on pages, or sites with lots of pages. If starting a brand new domain with 10,000s of pages it's best to ease them into being indexed with robots.txt, no metas.

I'm not so keen on the prospect of product sites. No one really wants to link to your affiliate site of product reviews. But good reference sites will become one of the most linked to sites in the sector over time. You don't really need to link build for them. I've done two days of link building in several years.

Running Turtles · 03-17-2019, 01:39 PM

Quote: (03-17-2019 09:31 AM)gework Wrote:

Quote: (03-17-2019 06:18 AM)The Grey Wrote:

One area I would like to know more about - do you have a process for generating new ideas like these? I'm not asking for your specific tactics, but can you shed some light on how you gauge demand?

...

When you say reference sites, do you mean informational sites as opposed to product sites?

gework · 03-21-2019, 02:42 PM

Quote: (03-17-2019 01:39 PM)Running Turtles Wrote:

When you say reference sites, do you mean informational sites as opposed to product sites?

Yes. The seven numbered sites I think of as reference sites - sites that people go to (more) for information rather than products, services or articles.

Login
Username:
Password:	Lost Password?
	Remember me