AI, ChatGPT, and Bing…Oh My

Steven Sinofsky
Learning By Shipping
21 min readFeb 22, 2023

--

And Sydney too. Consolidating some thoughts on an exciting two weeks of surprises, advances, and retreats in AI.

The following brings together three twitter threads on the launch of ChatGPT, the launch of Bing Chat, the surprises and the (too predictable) pullback of “Sydney”.

On the launch of ChatGPT

1/ No exaggeration to say that maybe chatGPT has captured the imagination of the broadest set of people in the shortest time of any computing technology. That anyone can use instantly it and see for themselves is a big part of it. How will productivity using “it” evolve?

Home screen of ChatGPT

2/ The polarization of fear <> opportunity is consistent with basically everything these days. That makes me a bit sad. I’m definitely one of those that see this as an advance. I am a believer in full throttle innovation and not second guessing at every step.

ChatGPT headlines.

3/ Lots of 4-D chess predicting where things will go. Who will win or lose? How much a platform shift is “AI” or not? It’s too soon to know. If PC, phone, cloud, or internet are a guide — wary/pessimists will quickly fall behind because exponential growth is like that.

4/ There are parallels to learn from and help guide us on how technology will evolve. Not the one path, but the sorts of paths that can follow. History rhymes. Why? Because both producers and consumers are humans and humans follow patterns, not precisely though.

5/ First, in the next 6–12 months every product (site/app) that has a free form text field will have an “AI-enhanced” text field. All text entered (spoken) will be embellished, corrected, refined, or “run through” an LLM. Every text box becomes a prompt box.

6/ This is a trivial add for most any product. Some will enhance with more bells & whistles. For example there might be an automatic suggestion (API costs aside) or several specific “query expansions” that take the text and guide the enhancement. Everyone will call the API to embellish, summarize, tone check, etc.

7/ This will be done to call attention to the new feature but also to add more surface area upon which to prove there is some depth to the work beyond just feeding what one types to the LLM.

8/ This reminds me of the mundane example of how spell-checking moved from a stand alone feature to integrated into word processing to suites and then 💥 it showed up in the browser. All of a sudden it wasn’t an app feature but every text box had squiggles.

A 1996 web page showing background spelling in a browser.

9/ At first this was neat. Then we quickly learned just how much we type in edit boxes is not amenable to spell checking. All those street names, part numbers, and people names. If you blindly autocorrected you’d end up with gibberish. You still do. Don’t judge prematurely!!

10/ LLMs will be like this too. In other words the invention will rapidly diffuse but actually “create work” for people as in order to realize the benefit there needs to be a “human in the loop” to smooth out the enhanced or “fix” text.

11/ People will be very critical of this step. Many will poo-poo the innovation saying that it takes more time to fix than it does to just do the work the first time. Grammar checking used to be like that as well. But not for everyone…

PS/ Some will immediately want to ban the use of a tool that is “wrong” or “removes humans”. You might think spell checking is trivial, but I had to get permission to use it frosh year to write papers. In High School I had to ask the principle to use it. just like calculators.

12/ Then after some period of time there will be a much more significant advance in the technology. Old spell checking was a dictionary on a local PC. Then Google used all it brought to the party to release “cloud spell check” which used…the internet.

13/ Spelling (and grammar) have not been the same since. That was 10 years after the first browsers and something like 25 years after the first stand-alone spell checkers for PCs and now everyone has it. But no one is dfferentiated by it. Why?

PS/ Here’s a history of red squiggles.

036. Fancy Wizard and Red Squiggles
The idle loop is a devil’s playground. –Developers on the Word team https://hardcoresoftware.learningbyshipping.com/p/036-fancy-wizard-and-red-squiggles

14/ There were many stand-alone companies that offered spelling (or analagous writing tools). Many *word processors* focused on “better typing” and adding spelling, grammar, tabs, page numbers, etc. But then a feature war ensued. 100’s and 1000’s of new features.

Image of floppy disk containing MicroPro SpellStar stand-alone spell checker for DOS.

15/ Many bemoan these feature wars as “bloat” but they play an important part in how customers consume products and it means you can’t be a “one trick pony” in the market. There is no “simple”. Consumers acquire productivity tools for the “worst case” not the simple or only case.

Chart comparing features of Word Processors from the early 1980s.

16/ Critically, the winning product is one that does the most *important* work, not the most mundane work. Ex, typewriters were good at filling out forms, but word processors were not. It took years before forms were a WP thing, but books, manifestos, etc. → WORD!

17/ What matters is doing important work, not simply automating cheap or easy work. The tools that win will generalize to the most important problems people face. The cost of adding additional tools or “point solutions” is much higher than savings.

18/ Ppl think automating the simple “eliminates jobs”. That has not shown to be the case. In fact, what has tended to happen is more automation/creation tools mean more people add “creation” to their jobs. Now we all fill out forms, not just administrative support. (below, unknown src)

19/ Better tools bring creation closer to where “human in the loop” adds most value. PowerPoint is an example of that. We all might bemoan “slides” but with a great tool (for *important* work) the most skilled/knowledgable will use the tools to do what was previously “support” work.

20/ When email was on the rise and Office was just Word, Excel, PowerPoint we were deeply concerned that all “documents” would become simple ASCII mail messages — all those features would no longer matter. *WRONG* Email made everyone a “content contributor” even big time execs.

Heading “Why is Microsoft Office so hard to kill?”

21/ Ppl *did* stop using Word for one page meeting agendas. Many stopped taking notes in Word. (Ditto for equiv Excel or PowerPoint tasks). BUT when work mattered and was *important* the human-in-the-loop contributing mattered and so did having the “most powerful” tool.

22/ That’s why Office, soul-sucking Office so to speak, is still used by hundreds of millions of people. It isn’t inertia but it is because the *important* work is done there and on the outside chance something important needs to be done, those features matter.

23/ Many seem to think LLMs will “eliminate” jobs or wipe out whole swaths of creative work. I think what will transpire will happen in two phases. First, all creation tools used will be augmented with LLM, and very quickly. Everyone will use these enhancements — human in the loop.

24/ Then over time new tools will emerge that can subsume the old tools. This is what happened as typewriters replaced typesetting, which were replaced by character word processing, replaced by graphical, and only now are we seeing changes in writing — 35 yrs later.

25/ Key is new tools, built from the ground up assuming LLMs, will need to get their footing doing the most important work and they also need to do the mundane. They need to be approachable by the people doing the mundane and the important. Otherwise they’re just point solns.

26/ Why things evolve this way is subtle. In the work environment, there is no shortage of “important” — everyone thinks their work is importnt. Every department. Every creative task. There are endless requirements or “needs” that will be thrown up as barriers to change.

27/ Everyone’s job looks easy from the outside. But until you actually try to automate a job out of existance or replace the workdlow or tooling of a job, you don’t really know what is the human part or where the “work” happens.

28/ Most of all, once tools start getting used by more people then the need for what is produced goes wat up. That means more people need to do more work. That means expectations for what is “good” or “important” all go up. New tools don’t simply automate, they create work too.

29/ One final example. In physical sciences and math, writing used to require special typesetting (math, physics) or even draftsman (for molecular models, lab bench workflow). There were humans in every company/dept that did this work.

30/ LaTeX, ChemDraw, and more came along and then all researchers were creators. Papers had more digrams and math. Expectations for presenting information went up. And those draftsman/typesetters weren’t just eliminated but used these tools for even more advanced work.

31/ This shows that even in highly domain specific and advanced tooling, that massive improvements do not simply make everyone’s job easier (or vanish) but add work for people — for experts — to do more, to create more, and most of all to be humans in the loop. // END

On the evolution of productivity using ChatGPT and the issues that will be faced

Adding to the excitement over LLMs and continuing the implications for productivity scenarios. LLMs represent the first tech advancement that has a potential to seamlessly deploy across 7 billion smartphones and thus can be a platform shift. HUGE bull here. 1/

2/ Tech has been pining for a platform shift ~10 years as mobile/social/cloud settled in. What’s next? Of course it has to be huge — the whole planet adopted the current platform. By layering on top of M/S/C and not requiring a device or legacy reset, the potential huge. Yay!

3/ So much said already about “Disruption of Google”, “Leadership of Microsoft”, “Changing of the guard”, and general “disruption” or “whole new world”. This is all premature or just silly. OTOH the technology will happen. The “cat is out of the bag” and mass deployment is next.

4/ The opinions have polarized like everything else in today’s society. The arguments “for” or “against” quickly amount to choosing sides and preparing for battle. “Misinformation”, “Does not Understand”, “Eliminates/reinvents all jobs” and more. Ugh.

5/ LLMs meet an important & necessary but not sufficient criteria for platform shifts. They don’t yet work all the time, boundary cases are plentiful. Recall from “Hardcore Software” first decade of PCs was literally “making them work”. The Internet? security, broken links, etc.

6/ BUT BUT to dismiss some concerns is as naive as dismissing the exponential nature of change. There’s a ton of hype. One thing I’ve learned about AI over a number of “winters” is the hype is due to “generalizing” what amounts to a significant advance in a point solution.

7/ The winters keep happening because the technologists and punditry tend to take a single advance and generalize it. Like advances in programming languages, AI can indeed make one scenario easier and doesn’t need to make all scenarios easier/possible.

8/ That said, LLMs represent a collision of sorts across two dimensions. LLMs are default:

  • optimistic/positive and the future of all software, productivity, creation
  • skeptical/negative and needs to be a priori constrained and made into a “responsible” tech

9/ When in reality, most want a broad middle: let’s all benefit from advances in market. Those abusing or misusing it should be held accountable

Never before has there been a broadly available new tech at such low cost so quickly been presumed a net negative by so many. Sigh.

10/ There are three areas (A, B, C) to be rightfully concerned that include the presumed negatives and temper the blind optimism.

The most important aspect is that this is what creates opportunities for next generation companies and makes it all so difficult for incumbents.

11/ (A) LLMs, especially with respect to web search, fundamentally breaks or calls into question, the internet (the web) itself. Yes.

The web relies on linking. Search relies on crawling. Entities that contribute to the web permit crawling for the benefit of being linked to.

12/ If LLMs simply use the crawling side of the internet without returning links then the incentives to permit crawling and ultimately linking go away.

For the largest content sites that have subscriptions or can afford this it is ok. But as we saw with news, almost none can.

13/ Today this is already a tension filled area. BUT it is not new. When I managed Microsoft’s search efforts (2006–7) huge *effort* was going into properly creating snippets and competing with what became “Google Onebox”. Providing answers without needing to click.

14/ The *effort* was how to avoid breaking the web, stealing content, and properly advancing fair use. This led to a series of industry changes such as licensing content, partnerships, search “verticals”. And a failure of book scanning.

15/ So while everyone wants search to simply “tell me the answer” unless the search provider “knows” the answer or “bought” the answer, or it is public domain, it can’t simply unilaterally crawl the internet to find it for free.

16/ This leads to (B) The legal system has no framework for this mass scale reuse. Is this Betamax, Napster, YouTube/UGC? Copyright laws, private property, libel, fair use, derivative works, plagiarism, and more run right into both training LLMs and the output.

17/ The “cat is out of the bag” was the argument for the Betamax and later commercial skip. But those ended differently and importantly home use is very different than the “storage, rebroadcast, retransmission” of someone else’s data for MASS consumption.

18/ No amount of “needle threading” too-clever lawyering will change the reality that training is making a copy of someone else’s data, using it, and even if the exact copy is discarded a derivative work is produced.

18A/ Imagine if every result came with every source that contributed to prompt output — almost a debug view. Is that legit “sourcing” or essentially a denial of service attack for the legal system?

19/ In junior high school we all learned: first, you can’t copy the Greek history section of an encyclopedia and call it a report on Sparta. Second, you can’t simply move the words around and add a sentence from another work if most of the report was from World Book.

20/ Many lauded Bing’s use of footnotes and sources. Putting aside future role paid placement will have in those, one needs to be incredibly clear that these are not always sources in a verifiable/legit sense. You can’t source something and say something different.

21/ But wait, 1) there’s a link so go figure it out. And 2) its just a computer algorithm taking text, rewriting it, adding other stuff, and so on just like a person. EXACTLY, and a person (especially a money-making one) SHOULD BE RESPONSIBLE.

22/ In the past 25 Section 230 years a person could post something on the internet and wherever they posted it did not count as the author. Now we have a first party author, human directly or not, and that is exactly what is not protected.

23/ Of course big companies know this, and so going through third party APIs or using other third parties to do scraping, rewriting, etc. seem like a form of legal insulation. Read the Getty v. Stability suit for exactly this reason.

24/ YouTube when it was acquired was viewed as a liability for google because of the copyright issues with UGC. Over time YouTube became a model for the legit protection of IP. It took a decade. And YT is very different now than it was then in that regard.

25/ There’s no getting around the fact that one site can’t simply move the words around and provide a link and claim that is just “research” or “fair use” or “derivative work”. There’s 200 yrs of law and constitutional protection at work.

26/ When it comes to “disruption” a big part of my belief is that Google spent 20+ years wrestling with this topic while also balancing business needs/desires. Their caution is the result of the reality they experienced.

27/ This leads to the third big area of concern (C) and that is the notion of “responsible AI” which I believe will lead to a significant tempering of output but also a *HUGE* missed opportunity to make the world’s knowledge more accessible.

28/ “Responsible AI” is the first time a technology spawned an almost police action before it was even deployed — primarily coming about during the early days of image recognition. Imagine if we had locked down early PCs a la Trustworthy Computing, but in 1985 or the internet in… twitter.com/i/web/status/1…

29/ The most big shot companies of the US and the CEOs have created “Policy Recommendations for Responsible Artificial Intelligence” where before any real use/deployment they already called on congress to regulate AI [sic]. s3.amazonaws.com/brt.org/Busine…

30/ These of course appear “good” but they cannot possibly survive the complexity of information, knowledge, scientific peer review, political parties, school boards, and also the world of what is deemed “acceptable” at any given time.

31/ Much of Reddit has been consumed trying to get Bing or OpenAI to say bad words or worse some “cancelable” offense. As it turns out this is not difficult. Worse, it is easy to stumble into those crossed with clear factual errors.

32/ The first answers to these problems will be to retreat and only say things that are “established facts” and “acceptable in today’s context”. As we know, humans are not allowed to say bad things even if they caveat them with “this is how people talked” or “I’m quoting”.

33/ The most mundane topics become off limits or “not worth the risk”. Even in a business context, this is enormously difficult. I can’t even make a complete list of all the times I dealt with spelling dictionaries, maps, clip art, and even fonts that were deemed “irresponsible”.

34/ Quite simply, the whole idea of “default responsible” when it comes to generating content based on user questions without human review of every input and output is unsolvable. There has to be room for mistakes, offenses, or worse.

35/ But how can that be with a default commitment to responsible. Worse, even if legal liability is removed, even if w/a EULA/wavier/consent box, no entity wants the endless/ongoing PR crisis of the day every time a news event causes a new wave of prompts and generated answers.

36/ This commitment from CEOs/lawyers/comms/HR is an invitation to a priori regulate AI. In many ways this is the worst position to be in — -regulating something before it has even really been invented. They asked for it to happen promising only the “responsible” output.

37/ So along with the existing legal framework needing adjusting to account for an unprecedented scale of automated “use” (fair or otherwise), the notion of “responsible AI” will need to be revisited lest it twist itself around every side of every issue.

38/ This is not “trustworthy computing” because that was binary — protect from bad people even at the expense of usability as we stated. This is a proactive agenda designed to appease a subset of customers and can’t possibly please all constituents of every issue.

39/ R AI is much more like Google’s IPO promise of “Don’t be Evil”. There was great skepticism about that at the time, and great hope. The skeptics were proven right because the world is complex and murky and unknown, not just good and evil.

40/ What does this mean then in practice. First, big companies are going to end up continuing to constrain the scenarios and “sanitize” the output. Huge swathes of content will simply not exist for fear of being “irresponsible” “bad PR” or illegal (or potentially so).

41/ Big companies will end up focusing on mundane results, especially in search, that will effectively provide a better expression of “OneBox” answers for known topics with scrubbed inputs, prompt kill-lists, hand-coded default responses, apologies, etc.

42/ Second, productivity tools and use cases for LLMs will end up focusing on much more narrow cases of mundane and repetitive work. This is work where LLMs are basically improved grammar/spelling/templates for common interactions.

43/ The biggest barrier to use a basic Word template has been just customizing it to the exact customer/use context without breaking grammar. LLMs make this easy.

44/ LLMs will be valuable to some degree for summarizing first party content, improving first party writing, or even modifying first party images using available/licensed images (eg, show this photo of our new product being used on an airplane)

45/ Unfortunately, these are not the “important” cases. These will not drive whole new layers of productivity tools. They will be great additions to existing workflows and tools, such as CS or CRM tools.

46/ Therein is the big opportunity: new tools that approach hard problems and high value prompts AND also from outset working within the developed legal framework while taking advantage of the world’s knowledge. Those have a huge advantage.

Invent. Take Risk BigCo can’t. //END

PS/ Love this example. Shows how the much lauded notion of using sources only makes a generated answer more authoritative when in fact it is not. This is a trivial compilation. Generated compilations/summaries should be noted “…according to the intern with no domain knowledge”.

Bing Pulls Back on It’s Chat, aka Sydney

Microsoft limits Bing chat to five replies to stop the AI from getting real weird theverge.com/2023/2/17/2360…//NOOO. IMO a significant strategy error misreading failure of past week & over-correcting. This compounds the mistake of conflating LLMs, Bing, and Search in general. 1/

Microsoft limits Bing chat to five replies to stop the AI from getting real weirdIf you talk to the AI too long, it might tell you it loves you.

2/ With OpenAI we *broadly* experienced the new wonders of a generative text platform. It gave us all a taste of an entire new form of creativity — generative creativity. In this thread I argued for the importance of “human in the loop” for productivity. (see above)

3/ Many people started to see a great deal of “fun” (or entertainment or wasting time or broadly creativity) by careful and crazy prompt engineering. ChatBot has the makings of an entire new form of tool, but it was very early.

4/ Many were very quick to focus on errors, hallucinations, and crazy side of what was generated. Much of existing AI researchers from big tech were piled on seeing limitations as risks. Of course “Responsible AI” chimed in over concerns where OpenAI had put mitigations in place.

5/ MS chose to build on top of OpenAI with an additional layer but far more importantly positioned their additions as a reinvention of web search and as a disruptive force unlike anything Google had seen before. Microsoft v. Google, the rematch. Really? Chat reinvents search?

6/ I’m skipping over Google but my thread below goes into view that Google had been wrestling with AI and trying to be *accurate* for decades and their success in search led to caution. This could feel like innovator’s dilemma. Or if could just be prudent. (See above)

7/ After 48 hours prudent looked correct. Bing with LLM morphed into Sydney and the excitement was quickly replaced by endless stories of crazy talk, uncanny experiences with Sydney, and a host of responsible AI problems.

8/ Then the AI researchers at other companies were quick to point out that this was all expected. Again, this anchoring was on accuracy, truth, tone, guardrails, and all the things expected from “computers are always right” and “search is for finding facts:”

9/ So now we’ll see endless punditry cycles about how AI was not ready and we need much more work on responsibility. The above limits on Bing are the first step. All a direct result of placing LLMs in context of search and productivity.

Is this MS *causing* the next AI Winter?

10/ In Search domains, the consumer expectation is consistent with “computers” which is accuracy, emotionless, predictable, non-biased and so on. Sydney was, by virtue of engineering, NONE of those things. It was designed to surface patterns and combinations not yet imagined.

11/ In that sense, and that sense specifically, that Sydney was a fun and novel breakthrough even over what we all had just started to absorb with ChatBot. This is where the misread/misfire starts to reveal itself. What was built was never going to be good at search.

12/ Search was primary problem MSFT chose to solve — burning capital, not really profitable, plus it was 20 yrs of flailing (I managed team Christopher Payne created and led in 2006–7, then Satya, then Qi Lu, etc) No surprise, MS might want payback as Google continued to *thrive*.

13/ It makes some sense to have seen LLMs as a broad extension to “answers” as I described in previous thread. But there was a technology mismatch — LLMs are nowhere near ready to provide definitive answers. Google has been struggling with this for decades, hence conservatism.

14/ To me this is an example of a technology mismatch — in time, positioning, and even company. — even though the concept/idea is exactly right. It is the way that Windows CE phones were the right concept but entirely wrong implementation, wrong time, wrong company.

15/ Many of the most successful tech companies have collected a large library of right concept, but wrong technology base, wrong time, wrong approach. Often this is viewed as “too early”

Of course in this context CLIPPY is a fantastic example. You’re welcome.

16/ The thing about these situations is that fans and those involved can years later paint a positive picture of being too early. But really it was just wrong, like Apple Newton wrong, like Windows 8 wrong. Too many elements needed to align to make this right.

17/ Now, and I mean it, I am not trying to be negative about the work. There is amazing work in ChatBot and Sydney was one of the most [unintentionally surprising] innovations in a long time. Like those products above, too many will be quick to close the books on it. DO NOT.

18/ The right answer is to listen to the experiment and market. Now is not the time to move on (as was done with those others). Now is the right time to, for the lack of no better word, pivot. Sydney is not about answers. Sydney is not the recipe for Bing to outflank Google/Ads.

19/ Sydney has potential to be a new kind of tool. Combine Sydney w/generative images, audio, video, and it is the genesis of an entire new era of tooling. The hallucinations, bias, randomness, and crazy are FEATURES NOT BUGS. They are what make it an entirely new creative tool.

20/ When we look back on platforms that were successful, most everything that some viewed as flaws and bugs turned out to be the features or the engineering constraints that legit turned a quirky thing into a platform.

21/ The browser *not* having the rendering power of Word was a feature. Lacking a security model was a feature. Lacking centralization was a feature. Broken links led to a whole series of inventions. The fragility of the PC compared to “IBM” unleashed innovation. And on and on.

22/ Everything we’ve seen in the past week or so is horrible if the goal was precise, correct, “Responsible AI approved” answers. Wrong tool, wrong user model, wrong positioning. Ten blue links + targeted ads are way superior — and least for now and foreseeable future.

23/ But the industry has not seen a new creative tool along the lines of Sydney in a long time — since maybe JPEG. Now would be a great time to unleash the power of this creativity and see what’s created. It might just be the next PC game, Netflix series, or maybe the metaverse.

24/ Is this all nutty crazy talk? Perhaps, but it was a human generated argument so it has those flaws. It is a reaction to a misunderstood failure, a strategy that wasn’t right at the start, and a what is certainly to be a [predictable] over correction, and so on. // END

PS/ Now is the time to double-down. Separate from Search. Make Sydney itself more available, cheaper. Build more tooling. Find more developers. Add images, sound, motion, animation. Build a generative future.

PPS/ One the hallmarks (necessary, not sufficient) of platform shifts is that a set of people emerges (and grows quickly) willing to make the new technology a hobby — to explore, poke, break, learn. To find limits. To create.

Sydney most certainly had those qualities.

PPPS/ here Sydney is basically writing a script for a new streaming series featuring a belligerent character forced into celebrity and has accountability challenged by equally belligerent power brokers. 🤔

A summary of potential reasons Bing went so off the rails relative to expectations. My view is that this only further emphasizes the mismatch between the technology and the scenario chosen. “Why *is* Bing so reckless?” @GaryMarcus

“One side felt that web search should remain the way it is while the other pushed for a chat-based interface. Ultimately, Microsoft decided to have both methods of search available and allow people to switch back and forth easily.” // Of course they did.

--

--