Working in LLM companies.

A few friends asked me: “Would you be working for OpenAI/Anthropic?” - it happened separately in different occasions and I though to share my though process.

Oct 31, 2024

I got asked if I was interested in working in LLMs - if so, why.

I am not experience enough to work on the model itself, updating weights and run training. But I have the experiences and skills to build, contribute and operate all the necessary components around it.

API layer, authentication, storage, privacy, etc… all the infrastructures.

Not the most glamours work, but necessary.

Companies that might need this kind of work and skills are alike OpenAI or Anthropic. What follows applies to both of them.

Growing needs

There is no a doubt in my mind that in the short period there is a steep and growing needs of consuming LLM tokens. This drives two needs. The need for more powerful and capable models. And the need of more performant, scalable and secure infrastructure.

Few other fields right now are growing as fast as GenAI. The need for infrastructure work will not stop. Not until the consumption of LLM tokens plateaus.

Even in mature tech fields, I keep seeing a steep increase in volumes. Volumes already mind-blowing. I can only imagine what the growth of their infrastructure looks like.

I am also sure that costs are a relatively secondary concerns. I expect that the cost of the infrastructure, excluding training and GPUs, is a marginal one. Still big, but relatives small. With enough budget to value engineering excellence and fast delivery.

With fast-growing infrastructure needs and budget as a lower priority, the work is interesting.

Interesting Work

Few organizations right now are in need of building infrastructure as fast as them.

Suppose an annual growth rate of 5x YoY - which I believe to be conservative.

The problem I would expect to see it is to design systems that can support 100x the current volumes.

A fast and experience team can roll out a new piece of infrastructure in around 6 months (already 2.5x)
Business expects infrastructure to stay current for 2 years. Otherwise, it makes no sense to build. (now 5x times 5x times 2.5x = 62.5x)
If infrastructure work is a priority, it means that it is the current (or close second) bottleneck. As soon as you remove one bottleneck, the business finds new needs and use cases. A 2x increase is conservative.

Infrastructures in these organizations requires designing for ~125x the current load.

The actual design of engineering infrastructure is already very interesting. But there is more.

To build and operate infrastructure that scale quickly, teams need to growth. Teams can grow in headcount and in capabilities. To grow team capabilities, you need to develop new engineering tools and to develop the team.

Growing in headcount comes mostly with management challenges. But it brings interesting engineering challenges as well. For instance, mentoring and onboarding.

Growing in capabilities is usually a more interesting challenge for engineers. Engineers will focus on building tools and developing the team.

Engineering tools, for instance, monitoring and alarming. Monitoring and alarming provides the team with the capability to quickly figure if there is something wrong with the system. Another example is traffic shaping tools. Those tools give the team the capability to quickly, easily and safely deploy new code or components to production.

Developing the team is done by shaping and mentor it to operate more efficiently. Efficient teams are small and cheap. Fast to deliver value. With low overhead. Creating efficient teams means overcoming challenges around communication, knowledge sharing, reviews, etc...

All these are interesting challenges. And those challenges will be abundant given the growth perspective of the companies.

Work is not only about what is being build, but also about social connections.

Given the very challenging problems, the high budgets, colleagues will be top-notch. Beside technically excellent. Colleagues will be open to challenges and changes. Otherwise, they would have not left their current jobs. People will also tend to be more doers than talkers. It is hard to hide in a small company with many hard problems to solve.

Impactful Work

LLMs and GenAI are going to change the world.

They will unlock a lot of human potential in the coming years and decades.

The model itself is fundamental. But it is not the only important component.

Fundamental for the users are the day-to-day operations of the whole infrastructure. The best LLM model is useless if nobody can use it. Or if its availability is so low that companies cannot rely on it for their operations.

In AWS they say that wouldn’t matter if all their code got open sourced. What matter is the exceptional high operational bar they maintain.

If you believe - as I do - that LLM will be transformative for humanity in the coming years. Working in infrastructure for LLM will be one of the most impactful work you may do.

Beside being an active part of this huge change. Working on these technologies and companies will be impactful on one own career and growth.

Shaping the tools, the technologies, the teams and the operations of hypergrowing companies teaches a lot of lessons. Lessons about both technology, and businesses, and team dynamics.

Not all of it will be useful - few companies will replicate the same conditions.

But, it will teach a lot in a short amount of time. Problems that in companies growing at a normal rate manifest after 3 years, in hypergrowing companies show up after one. It exposes you to many more challenges in the same amount of time.

Morally Gray

Like all automation tools, LLMs carry some morality concerns.

It is moral to work with the goal of automate large chunk of white collar jobs? Is it right?

While I find these concerns valid, they must find a political solution. Not a technical one.

Automation will happen.

But, automation will free up a lot of human capital to use in more interesting and rewarding ventures.

Humans currently bogged down in bureaucracy could do art, research, human work.

I fall on the positive and pragmatic perspective of the issue. While I see the risks and concerns, I keep thinking that it is a positive development.

It makes no sense to try to stop it.

Risky developments

LLMs are still very much on the frontier of tech.

Working in the fields means tying your compensation and your recent work experience to the success of those companies.

The price for token is falling. Open source model are very much competitive and improving. The business itself is capital intensive and still relying on external funds.

I am definitely positive that LLMs will have a great future. But it is too early to say that the company working on them today will be successful and dominating the field.

The work you put into those companies may as well go to zero in a few short years. For reasons completely outside tech and your influence.

But it is also true that those years of efforts could compound immensely. They would compound for reasons not directly related only to the technology you would create. And for reasons definitely outside your control and influence.

I don’t think it is a sure bet. But for a young engineer with some financial stability and skills, it is definitely a bet worth considering.

Conclusion

I am in a blessed position. Personally, professionally and financially.

From my position, if the right opportunity arises I would jump on building infrastructure for LLMs.

It is an occasion of the lifetime, that I can afford to venture.

Thoughts from the trenches in FAANG + Indie

Discussion about this post